Google Search is losing its 'cached' web page feature

d3Xt3r@lemmy.nz to Technology@lemmy.world – 664 points –
Google Search is losing its 'cached' web page feature
engadget.com

One of Google Search's oldest and best-known features, cache links, are being retired. Best known by the "Cached" button, those are a snapshot of a web page the last time Google indexed it. However, according to Google, they're no longer required.

"It was meant for helping people access pages when way back, you often couldn’t depend on a page loading,” Google's Danny Sullivan wrote. “These days, things have greatly improved. So, it was decided to retire it."

146

They really have just given up on being a good search engine at this point huh?

They are an Ad company, and using cached page doesn't bring ad money to their clients

Make sense, it seems that they have been having lots of meetings regarding how to maximize its revenue

They'll reintroduce the feature with their own ads embedded.

They may not have a choice in the matter. AI-generated pages are set to completely destroy the noise to signal ratio on the web.

Google's business has two aspects, collecting user data and serving ads. If Search stops being relevant people will stop using it, which impacts both aspects negatively.

Well that really sucks because it was often the only way to actually find the content on the page that the Google results "promised". For numerous reasons - sometimes the content simply changes, gets deleted or is made inaccessible because of geo-fencing or the site is straight up broken and so on.

Yes, there's archive.org but believe it or not, not everything is there.

Or locked behind 100 pages of unnecessarily paginated content. Seriously, one of the best features that a webpage has over a physical printed page is the ability to search it for what you were looking for... smh:-(.

We must archive all the things

That's bs, it's one of the best features Google has and they've been ruining it. Wayback machine wished it could be that comprehensive.

Wayback is definitely more comprehensive than Google. I’ve only seen three occasions of links Google has saved that Wayback hasn’t.

i fear for the days when some cruel unfeeling interest comes for archive.org too

of course it is. why have anything good on there, no point reminding me of the old days when the internet was actually fucking useful

Since when did you use this feature? Please cite a source

Literally yesterday. What source is sufficient to tell you first hand that I used the feature yesterday?

You want proof that it's useful. Go look at waybackmachine. Literally millions of users using a cached web page feature.

I also literally used it yesterday, mostly because my work has an insanely over the top site blocking situation, and rather then having to input (and likely get a rejection) to allow the site, cached page usually works good and gets me the info I need.

That is exactly why I use it. I need to access pages for work, our internet security is ridiculously overdone and so many sites don't load... but the cached versions do. Fml

Photo / visual evidence would be fine, I am not picky. I would just like to be sure you are telling the truth, a lot of fraud on the internet nowadays 😒😒

1 more...
1 more...

I last used the feature to view deleted reddit posts.

Another time I used something similar (the wayback machine) to view long gone websites about a postcard

I've used it three times today. Site down, geo-blocked, and a forum post with info I needed deleted.

Like a couple of times a year at least. Faster and easier than going to the way back machine to get a copy

So ignorant, if you've had to do any digital research, you know these tools intimately

2 more...
2 more...

Without getting into too much detail, a cached site saved my ass in a court case. Fuck you Google.

It sucks because it's sometimes (but not very often) useful but it's not like they are under any obligation to support it or are getting any money from doing it.

Isn't caching how anti-paywall sites like 12ft.io work?

At least some of these tools change their "user agent" to be whatever google's crawler is.

When you browse in, say, Firefox, one of the headers that firefox sends to the website is "I am using Firefox" which might affect how the website should display to you or let the admin knkw they need firefox compatibility (or be used to fingerprint you...).

You can just lie on that, though. Some privacy tools will change it to Chrome, since that's the most common.

Or, you say "i am the google web crawler", which they let past the paywall so it can be added to google.

Or, you say "i am the google web crawler", which they let past the paywall so it can be added to google.

If I'm not wrong, Google has a set range of IP addresses for their crawlers, so not all sites will let you through just because your UA claims to be Googlebot

I dunno, but I suspect that they aren't using Google's cache if that's the case.

My guess is that the site uses its own scrapper that acts like a search engine and because websites want to be seen to search engines they allow them to see everything. This is just my guess, so it might very well be completely wrong.

Was that not something the Wayback Machine could have solved?

Depends. Not every site, or its pages, will be crawled by the Internet Archive. Many pages are available only because someone has submitted it to be archived. Whereas Google search will typically cache after indexed.

Google is the king of giving bullshit reasons to hide their true intent.

My guess is ads don't work in cached pages.

This is the real reason. Google is an ad company, not a search engine.

Just like that safetynet thing. They will write long pages about it, but won't admit they want to make custom android roms unusable for the average user.

We that's some shit. I often use that to get info off of pages that I won't be clicking on normally.

there are half a dozen still very good reasons to keep this feature and one not to: lost ad revenue

assholes

I can't imagine there was even that much lost revenue. Cached pages are good for seeing basic content in that page but you can't click through links or interact with the page in any way. Were so many people using it to avoid ads?

Were so many people using it to avoid ads?

I doubt that as well. There are much better ways to deal with ads. I always only used it when the content on the page didn't exist anymore or couldn't be accessed for whatever reason.

But I suspected this was coming, they've been hiding this feature deeper and deeper in the last few years.

but you can't click through links or interact with the page in any way

Most of the time that's exactly what I want. I hate hunting through 473 pages of stupid bullshit in some janky forum to try to find the needle in that haystack.

I feel like 99% of its usage was to avoid ads/paywalls/geo/account restrictions on news and social media sites

1 more...
1 more...

These days, things have greatly improved.

Websites will never change their URLs today.

i maintain redirects for old URLs for which the content still exists at another address. i've been doing that since i started working on web sites 20-some years ago. not many take the time to do that, but i do. so there's at least a few web sites out there that if you have a 20 year old bookmark to, chances are it still works.

The enshittification will continue until quarterly reports improve.

Just kidding, it will continue regardless.

If anything it will keep accelerating the worse quarterly results are as they try to solve their way out of problems they made while still keeping the problems

By they way, I just found out that they removed the button, but typing cache:www.example.com into Google still redirects you to the cached version (if it exists). But who knows for how long. And there's the question whether they'll continue to cache new pages.

they've broken / ignored every modifier besides site: in the last few years, god knows how long that'll work

Quotes are fucking awful now. You have to change the search terms to verbatim now which takes way fucking longer. Google has enshittified almost everything. I'm just waiting for them to ruin Maps.

Remember when Google Now was intelligently selected data and not an endless scroll of paywalled news articles?

I hope they only kill the announced feature but keep the cache part.
Just today I had to use it because some random rss aggregator website had the search result I wanted but redirected me somewhere completely different...

My guess is that a cached page is just a byproduct when the page is indexed by the crawler. The need a local copy to parse text, links etc. and see the difference to the previous page.

1 more...

It was meant for helping people access pages when way back, you often couldn’t depend on a page loading,” Google's Danny Sullivan wrote. “These days, things have greatly improved. So, it was decided to retire it."

They still go down, Danny. And fairly frequently at that. Y'all are fuckin' stupid.

I'd say things are much worse than they used to be. Sure, in the past sites would disappear or completely fail more often. But, because most sites were static, those were the only ways they could fail. These days the cache feature is useful for websites that have javascript bugs preventing them from displaying properly, or where the content-management-system still pretends the link works but where it silently just loads different content.

How has no one worked on a new search engine over the last decade or so where Google has been on a clear decline in its flagship product!

I know of the likes of DDG, and Bing has worked hard to catch up, but I'm genuinely surprised that a startup hasn't risen to find a novel way of attacking reliable web search. Some will say it's a "solved problem", but I'd argue that it was, but no longer.

A web search engine that crawls and searches historic versions of a web page could be an incredibly useful resource. If someone can also find a novel way to rank and crawl web applications or to find ways to "open" the closed web, it could pair with web search to be a genuine Google killer.

  • Google invents, invests, or previously invested into some ground breaking technology
  • They buy out competition and throw tons of effort into making superior product
  • Eventually Google becomes defacto standard
  • Like a few years pass
  • Google hands off project to fresh interns to reduce the crap out of the cloud usage to decrease cost
  • Any viable alternatives are immediately bought out by Google
  • Anything left over is either struggling FOSS or another crappy corporate attempt (cough cough Microsoft)
  • Repeat

My favorite case in point being Google Maps.

There's a lot of startups trying to make better search engines. Brave for example is one of them. There's even one Lemmy user, but I forget what the name of theirs is.

But it's borderline impossible. In the old days, Google used webscrapers and key word search. When people started uploading the whole dictionary in white text on their pages, Google added some antispam and context logic. When that got beat, they handled web credibility by the number of "inlinks" from other websites. Then SEO came out to beat link farmers, and you know the rest from there.

An indexable version of Archive.org is feasible, borderline trivial with ElasticSearch, but the problem is who wants that? Sure you want I may, but no one else cares. Also, let's say you want to search up something specific - each page could be indexed, with slight differences, thousands of times. Which one will you pick? Maybe you'll want to set your "search date" to a specific year? Well guess what, Google has that feature as well.

Brave is not a business that should be supported. Also, I'm pretty sure they just use Bing for a back end.

There are also a few paid search engines that people say are good.

What's the issues with brave??

They've had a history of controversy over their life, ranging from replacing ads with their own affiliate links to bundling an opt-out crypto miner. Every time something like this happened, the CEO went on a marketing campaign across social media, effectively drowning out the controversial story with an influx of new users. The CEO meanwhile has got in trouble for his comments on same-sex marriage and covid-19.

In general, it's always seemed like it would take a very small sack of money for Brave to sell out its users. Also, their browser is Chromium based, so it's still contributing to Google's market dominance and dictatorial position over web technologies.

I recommend Kagi. Bought a family plan and it feels like I've gone back to 2016 when the search engines weren't a dumpster fire.

The next revolutionary search engine will be an AI that understands you. Like what a librarian is.. Not just ads served.

i don’t need a search engine that understand me i need a search engine that finds sites and pages based on a string of text i provide it

we should be calling the future piss the way it’s going down the toilet

Well, at the least, you need something to filter out the shit trying to game seo. To me it seems that AI is the easiest approach.

Bing's copilot is genuinely pretty good, the AI answer is often pretty accurate and the way it's able to weave links into its answer is handy. I find it way more useful than Google search these days and I'm pretty much just using it on principle as Google is just pissing me off with killing their services, a few of which I've used.

I don't think Microsoft is some saint but copilot is just a good product.

Yes, that would be a Google killer. If you somehow find the money to provide it for free.

Finding a novel way of searching is one thing. Finding a novel way of financing the whole endeavor (and not going the exact route Google is) is another.

I find this very useful to read paywalled articles that Google has managed to index!

OK, I see why they might want to get rid of it.

Ironically just yesterday I needed Google Cache because a page I needed to read was down and I couldn't find the option anymore.

Are we going to need to go back to personal web crawlers to back-up information we need? I hate today's internet.

https://github.com/dessant/web-archives

It's a browser extension that links to a dozen online caching services.

Hmm, tried it on Firefox Android but not sure it is working.

It's called "Web Archives", you can install it from the Firefox official extensions.

To use it you open the menu while on a page, go to Addons > Web Archives and select a search engine.

Ran across the same problem recently. Ended up using Bing, of all things lol

In a shocking turn of events, google decided once again to make their namesake service worse for everyone.

Legitimately baffling, keeping this feature doesn’t really seem like it would impact anyone except those that use it, while removing it not only impacts those people that already use it, but those who would potentially have reason to in the future.

Cannot think of a single benefit to removing a feature like this.

It is only baffling if you still think that Google's aim is to help people. At one point they were trying to gain market share and so that was true. It is not anymore.

ostensibly it takes a lot of space to cache that much data, but seeing as they own youtube this should be nothing in comparison

i would guess they have it cached still anyways.

Finally, an excuse to use the Wayback Machine for all of my searches!

Ironically, the link to this as article is offline for me. "Cached" surely would solve my problem.

I stopped using Google late last year and it's pretty eye opening how much freer I feel now. Previously, any searches I made would follow me around. Make a one time search for something I'd see that being advertised later on. As a result I started searching more using private browsing. I'd often forget though and end up being tracked.

Ultimately switching to Firefox and DuckDuckGo I no longer have to do private searches. No more being followed around the internet.

Also I'm not convinced private browsing works. For example I still use it for YouTube but I noticed despite YouTube not knowing who I am, the videos on the home page include some that are very related to my usual videos. I guess they are using IP's to still deliver relatable videos.

Private browsing keeps your computer from remembering things about what you did. It cannot keep other people’s computers from remembering everything about interacting with you.

Yt doesn't know who you are, but it knows damn well who was last logging in from that PC/IP.

didn't that happen like years ago? or maybe because I am using Firefox, but I haven't seen the button for the cached website for a while now

It has barely existed for years anyway. Anyone can remove the Google caching from their website and most major websites and many small ones do.

Now I just have an archive.org extension to do the se thing basically.

Ya I'm just surprised to hear the feature still exists. I remember the option to view cached page disappearing from every search result I would try to use it on years ago.

Fuck. I sometimes use the text-only version to access sites with too many moving elements or when the site is geoblocked or doesn't respect cookies choices and denies access. So far, it has been the most reliable one for me.

Has Elon secretly bought Google too?

Nah, they've been pulling crap like this for at least a decade now, nothing new here

Yup, removing useful features is kind of Google's thing.

I still mourn the death of the Menu button in Android.

That is BS, a site can be down at any time, did we fix downtimes for good? Those down detector sites might just shut down as well then ಠ_ಠ

Google well on their way on their uber-dick speedrun

This is the search engine equivalent of aiming a carbine at your feet and shooting yourself with a .50 cal round.

Cached pages were something I found myself using quite a bit and them going may be the push needed for me to use an alternative search engine.

Enshitification strikes again. Cached doesn't make money and maybe reduces adclicks so it's gone. This benefits Google but not users in any way whatsoever.

I kind of wonder if they're just training machine models with it all so they don't have to store the content. That would give us a pretty good reason why their search results became inadequate over the period of a month or two.

Internet Archive is essential now. I used to use Google Cached for when IA failed. All researchers are now losing that resiliency.

Was it even still around? I can think of a few times in the past few months where I've tried to find the cached link to a google result and failed. Most recently just two days ago, when a site I wanted to use was down for maintenance.

Cached pages haven't worked on many sites for several years already.

And for specific types of sites, it 100% still is needed and a great tool.

Google is spelled Kagi now. :)

No fucking way I'm paying a subscription to search something on the Internet. 5$ for 300 searches, lol.

Beyond that, the money is still going to Google, Yandex, Brave, Bing etc via API payments. If they actually created their own search engine that was any good I’d be more inclined to pay for access.

https://help.kagi.com/kagi/search-details/search-sources.html

Edit: They do claim to have their own small indexes (Teclis and TinyGem) that they sell API access to, but I’m doubtful it adds significant value.

I have been looking at kagi but their pricing is definitely made to force people to buy the professional $10 package.

100 or even 300 searches/day would be unusable for me, you quickly spend 10 searches refining a query for something special, and when developing you do like 5-10 searches/hour.

A fair pricing model would be

  • $2/month for 1000 searches/day
  • $5/month for 5000 searches/day
  • $10/month for unlimited everything

Paying for the Reddit API would be cheaper. That's an impressively overpriced search engine.

I split the duo plan with a friend and do annual and it's $6.30/month for unlimited searches.

Oh shit, it's 5 dollars? That's like.... A cup of coffee. You are right, way too much, so much money.

Ad based search engines make almost $300 a year off their users

What disingenuous phrasing.

I'd be up for using a product like this, but their popcorn pricing and snark is really off-putting, so I'll never be using this service.

I haven't seen that available for literally years. I thought they killed it long ago.

Google sucks.

they hid it under a little 'more' menu awhile back. i kinda saw this coming

Maybe I was one of the test subjects then because it wasn't there at all, menu or otherwise. 🤷

No, there are still use cases for it. I usually use it to retrieve web pages from sites that get incorrectly blocked by the firewall at work.

JFC...at this point I may as well stand up a self hosted search engine.

Is this really such an essential feature when archive.today exists?

Not really but I'm disgusted with the continual downgrading of Google Search and it's hyper-focus on increasing profitability at the cost of user experience and data privacy.

I was already toying with searXNG anyway, so it's not a big leap.

A few months back Ruud stood up a copy: https://searxng.world/

I've been using it, and it tends to be as good as or better than google's search. There's only been a handful of instances where I've explicitly used google's.

Thanks, I'll give it a try. I've been using https://searx.work/ to play with the tech and I'm almost satisfied enough to stand up my own instance.

Edit; I removed my dumb-assery around default search engines.

Sounds like someone's after storage savings.

All those racks of hard drives are taking up the space they need for racks of Nvidia GPU's.

They use their own TPUs instead of NVIDIA AFAIK but yeah.

Seeing many comments here shitting on this decision by google, is this really that big of a deal? I've personally never used the cached feature of Google and if I ever needed to see a page that is currently down, it'd be via wayback machine. If nobody used the feature, why have it waste a ton worth of storage space? Feel free to prove me wrong though.

It was also useful when the page had changed inbetween google indexing it and now, so if you loaded the page and couldn't find the text you were searching for because it was deleted, you could find it on the cached page.