Google is getting a lot worse because of the Reddit blackouts

Nahlej@lemmy.world to Technology@beehaw.org – 330 points –
Google is getting a lot worse because of the Reddit blackouts
theverge.com
158

honestly we should have collectively realized way earlier that putting all the useful, readable, un-touched-by-SEO help content for basically every niche hobby fandom and ideology in the hands of one for-profit entity was not very wisdom-pilled of us

we should have collectively realized way earlier

some people have, but whenever you'd mention it, you'd be met with "lol take the tinfoil hat off", "but we're already using [for-profit platform] why would we move when everyone's here" and "but it's haaaaaaard".

Source: https://xkcd.com/743/

The fact that the alt-text directly mentions Diaspora is more than amusing in this context

Hey! I'm not probably autistic! I'm definitely autistic, there's a difference!

Had to zoom in to find out why it is suddenly year 200. There is a tiny 1 in there.

Yes. When everyone enters info on corporate sites, sooner or later they'll decide to monetize it.

Reddit going evil on charges and showing their colours in the AMA has been a wake up.

I agree, but I also have serious concerns about this being the replacement strategy. It could be because of my ignorance of how this all works though. Like many of you, I am new and here because of the reddexodus.

These servers are going to cost money, and for many of them the money will run out. Is there a function to preserve the collective content of an entire server once it goes dark? I know that you can migrate your own account to another server, but what happens to everything Google has indexed at Lemmy.world if the worst happens? Is it all just dead links? What if many of the users do not migrate? Is it just gone?

I am concerned that in the current state we are setting up to burn everything that loses a couple admins or becomes too old to economically host.

I was on a mastodon server and the owner decided it was not worth his money to keep running. He did not inform anyone on the server or allow any account backups and all was lost.

With federated services, I feel like it's somewhat important to get to know the admins of the server you use. You don't have to be best friends, but at least know their name, motivation for running the server, and how it's funded.

Before reddit removed them most of this compiled knowledge was in the subreddit wikis. I honestly believe a return to communities with wikis is the long term replacement.

In practice the content is distributed to all the other servers, so people who have been reading it before will still be able to on their own instance, but you're right the indexed domain is gone and so are the results in Google.

But there is one difference, one instance of lemmy only stores a very small fraction of the content. And it's much easier to fuck up one reddit compared to fuck up thousands of lemmy instances simultaneously. So if one instance goes down, the rest of the fediverse is still up and running.

These are certainly possibilities! It's happened elsewhere in the Fediverse... but already we can export most of our data and migrate to a different instance. Getting these base features right is important before enhancing their functionality. Planning for the future is important too. So far I've been impressed by Lemmy, though it's not nearly as portable as Mastodon or Calckey or Pleroma etc. Part of that is that in Lemmy/kbin we don't follow other users... we subscribe to groups (subs/communities/magazines).

Still, with the nature of ActivityPub, it's inevitable that migration tools for Reddit-like federated apps will get built quick-like

I think it's a fair concern. We've seen other parts of the fediverse successfully implement crowd sourced funding via patron and similar to keep mastodon servers running and I suspect if Lemmy remains "the place to be" admins will have reasonable success with a similar model. Lemmy is super efficient and can support 100s of users on a single box so I think if 1% of users paid like $5 a month you could probably still support 99% of users "for free".

I'm sorry, but clearly you have not looked for niche information on Google for a while now. Lots of links end in dead ones, particularly when I am looking for vehicle information on older models.

I'm not sure what you are trying to say, we shouldn't be concerned because this problem already happened?

A lot niche older vehicle information, if it wasn't hosted on Reddit, was often on forums funded by enthusiasts, which eventually ran out of money and no longer exist. This is exactly the problem that I'm concerned about. Particularly so if a certain community balloons in popularity and an admin nukes it to keep the server costs under control for the other members.

Completely what I'm saying, but to add on it is not just forums. With the new web, I've hit a deadend on many OEM websites as well, and part websites, and others. I'm sure cell phone and computer information is similar, in fact after trying to research a power supply for my old prebuilt I know it's a fight.

2 more...

I've said it numerous times over the years, the Internet has been centralizing rapidly and it benefits none of us.

In 2005 you'd wander around, going from peoples' personal pages to forums to whatever else people linked. In 2015 half of those websites were dead because everyone got their content on reddit anyway.

I just can't agree more with you. Like wow this reddit blackout has truthfully opened my eyes to the massive, giant and incredibly amount of useful information that is currently resting on reddit servers.

we can still easily fall into this trap if there isn't a good way to migrate communities between instances. And even if we could just take /c/technology@beehaw.org and move the whole thing to /c/technology@feddit.de or something, that would still break all the indexers' links

What we really need is some sort of torrent-like system for this content with something equivalent to magnet links.

Sounds like you're describing ipfs :D

https://ipfs.tech/#how

I love the idea of IPFS, but every time I've tried to use it, it has always been very slow.

amusingly another chicken egg problem. More chickens, faster the eggs. Wait that metaphor works!

IPFS is for static content. For dynamic content, like reddit/lemmy, you'd want to build it on something like Locutus/Freenet 2023.

One thing the FOSS world really needs to get on right now is some form of search engine accessible distributed content archival. We need a way to store useful content from the past in a way that no one individual or group of individuals is capable of deleting it.

That is the main reason why I've been blogging on my own website since 2004 https://paradies.jeena.net/weblog/2004/apr/ersteintrag (and switched to English in 2010 https://jeena.net/posts )

Yep. I blog infrequently but I’ve said a few times in my posts, I am writing this article because I need to remember the steps to do this weird niche thing in case something breaks in the future. If it happens to help someone else out, great.

Need some bots to start porting all those posts over to Lemmy lol.

2 more...

Reddit actions are tragic for the web. I can't even tell you how many times I searched something and typed Reddit at the end of the query. Not just because Reddit search SUCKS, but mostly because it's a gold mine of information. Especially for technical stuff.

Your game crash? Reddit. Weird bug on your laptop? Reddit. Looking for a cool app? Reddit. Have a weird question? Reddit.

Reddit saved me countless hours and headache. I felt that yesterday when doing a search about something without even putting Reddit on it, kept bringing up Reddit links. I'd click on it without reading and end up on a locked sub because of the blackout.

It sucks but I hope it's going to continue. But at the same time, I don't see Reddit backing down. And even lf they do? I'm not going back. Because how dare you? Like... screw you for even trying to pull that crap on your users.

Reddit is the web we built. And fuck u/spez decided to give it away for money.

I miss Aaron Swartz and the open web. Let's rebuilt it again, on better foundations!

Try using ChatGPT if you haven't. Ive used Reddit in the past for a lot of troubleshooting, but ChatGPT is easier to get the answers I'm looking for unless I asked the question myself. But there's no judgement from ChatGPT lol

Used chatgpt to rework my resume recently, holy shit that site is a godsend.

Though, take care to factcheck what you get from it; all it really is is just a word predictor, and it can be pretty good at confidently telling you absolute nonsense that sounds right

Definitely true, however my usage of it has been to troubleshoot code. I wouldn't suggest using it for research purposes

1 more...

Am I the only one that's noticed how reddit has been fucking with web crawlers? They insert newer comments into older posts so the crawlers pick up false results.

A few years back they started injecting a "related posts" box into pages. What that does is multiply the amount of results a crawler will pick up. But all those are false results. There's only one true search result which is the original comment/post. Some times I find myself sifting though the search engine results to find the actual original post. The rest are completely worthless, off topic, reddit posts littering the search index.

I know all this blackout stuff hurts now. I see it as necessary for the platform to lose its status as the "front page of the internet". Reddit turned evil a long time ago. It's long past time it be deposed of.

That explains why the search page quotes a comment that doesn't exist on the post. That always confused me. It's insane how dependent on searching with "reddit" appended on the end of the search term I am. I have qualms as to how this'll bode for search engines if reddit loses interest or goes under.

I couldn’t understand how those changes back then crippling the user experience were “better” in any way, this explains a lot!

Had this happen today. Was searching for some programming related stuff and top pages are all inaccessible Reddit posts.

Hopefully it will help people realise that a profit motive being attached to everything is actually counterproductive societally.

Same. Had some things I needed to look up for my 3D printer and much of the results were inaccessible.

Was a pain.

Ditto, actually. The 3D printing communities I've seen here are just so much smaller.

About 4 people at work Monday discovered the blackouts and learned the reason from following Google results. I'd say that shows the effectiveness of the protest. That's 4 individuals that I work with personally who wouldn't have known otherwise about the api problem that now do. I can only imagine how many people are in that same boat.

Same, but it’s just growing pains.

We should start rewriting posts in lemmy with the correct information.

Sad thing is most search engines suck/haven't really indexed mostly anything in the fediverse. Wonder why

The fediverse is really not good for big companies. It cannot be monetized or controlled.

It’s obvious you know this, but we just need a search engine that’s tuned to search the fediverse.

People rely far too heavily on reddit for public resources. Here's hoping that changes now.

This also highlights the problem with a lot of communities moving to Discord, which inevitably ends up as repositories for critical information, but can't be indexed by Google. Reddit is still valuable as a problem solving resource, and I hope they fix this API fiasco.

I'm willing to bet the lack of api access going forward will make all reddit posts disappear from crawler results anyways. I'm no expert, but I imagine the crawler is picking up on all of the interconnected references to reddit that are all due to free api access. As soon as those connections disappear, so dies the value to the entire community. It will be just like the garbage results we get from every single source now. This is the path of neo digital feudalism.

API calls are almost always private between the caller and the endpoint (think telegram bots or mobile apps). There isn't really a technically feasible way for a crawler to somehow "infer" any kind of knowledge of how api calls are being used unless the result has some kind of publically visible side effect (E. G. The program using the api is generating a web page and uploading it somewhere crawlable). Google et Al go by how many links from other pages to the page of interest exist (inbound links) and multiply by a smattering of other things like quality of keywords, length of content etc.

That said, if you're implying that the api changes mean that:

  • people are less likely to use reddit because they can't access it via RIF/Apollo
  • less useful content is added to the site to be indexed,
  • fewer inbound links will be generated that point to existing posts
  • pages stagnate and drop in ranking

That is a plausible concern.

fewer inbound links will be generated that point to existing posts

pages stagnate and drop in ranking

This is what I mean, the external references people had in the periphery will dry up. Like if I'm not using Infinity to generate better refined search results, now I don't post the link to Stack Exchange, and this reference fails to cascade across various copy paste blog resources. Now the original reddit post is a dead end source with no external weighted reference value. It's all of these advanced features implemented in the periphery using the free API that create the usefulness in the first place.

Searching reddit will be just like YouTube searches now. No matter what technical wording you use, you'll never find technical references again. I can type the title of a video on YT verbatim and still won't get the correct results, but I can log into an old account and find the content in my hundreds of playlists I kept as references. It is still there, it is still public.

Yeah that makes sense! I totally agree! Search is becoming pretty difficult these days!

The other thing is that Discord search is god awful. There's absolutely no way to modify your search for better results, whether that's to require something to appear exactly as typed, or to exclude certain results, it's just you put in the words and hope you get the right thing. Sometimes that works out, but sometimes it will make the dumbest connections and render your search useless unless you want to trawl through pages of crap you don't want. Like I've found out that Discord considers the words universal, universe, and university to be the same...

Tacking "Reddit" onto search queries almost became a prerequisite. Never imagined I'd have to replace that with "-Reddit".

It's made researching a media centre setup very difficult this week...

Give it some time, people will get comfortable here, the revolution dust will settle an we will be adding ‘-Reddit “Lemmy”’ to search queries (fingers crossed!)

But how would this work with broader federation? Searching other instances like beehaw or kbin? We'll needan new search optimization to search the fediverse more efficiently.

I guess google will just have to suck less if they want us to keep using it.

Before reddit removed them most of this compiled knowledge was in the subreddit wikis. I honestly believe a return to communities with wikis is the long term replacement.

Honestly, not a bad opinion, when the wikis were done well, they did have some extremely useful information. I wonder if we could do something like that in Lemmy...

That was my first thought - if reddit doesn't want that feature, we'll take it!

It would be interesting if Fediverse platforms made an external wiki for discoverability. A big shared community resource all in one place.

Yeah, the wikis came in clutch a lot of times for me. Really well done with how organized they were for the ones that had them.

Google Search has been sucking for quite a long time.

"site:old.reddit.com" was just a temporary fix

This has been deeply frustrating, but since that's the whole point, I support this collective inconvenience.

All in all it's also a testament of how bad internet is now. All the information is concentrated in few sites that, if gone, gets lost.

Also, I find that basically every search result that isn't reddit is sponsored content.

Search something real specific like "Best aftermarket injector coils for a 2009 Toyota Corolla" and you're going to get 100% advertisements and listicles for search results, likely written by somebody who doesn't know shit about cars.

Append "reddit" to that search, and you'll be led to a post from a car mechanic giving their opinion on the matter. And, well, I do trust a random stranger on the internet more than I do an advertisement.

Definitely saw this coming… can’t imagine what will happen if Stack Overflow pulls something similar. All WebDev/DevOps work will halt overnight.

I’ve been trying to put my issues/solutions in a personal blog or wiki, but there’s so much old info out there in sites like Reddit/SO/medium/etc, it’d be a huge loss when it goes away.

Maybe it really is time to get open sourced AI and bots to archive useful information so they don't get monopolized.

4 more...

We're going to have to actually read official documentation instead of relying on some greybeard's wisdom on SO 🥲

At least with SO, they have historically put up dumps of all user data on archive.org (that stopped recently but it's allegedly coming back). If something were to happen, at least the information would still be decently accessible, just not indexed as well.

4 more...

I think it's more appropriate to say that internet searches in general had been getting worse over the last several years, but it just so happened to be the case that your answer could likely be found in a reddit thread.

For many people google (or whatever engine) was just a gateway to get informations on reddit. With all those sub reddits down at the moment, a lot of searches are really hard to get informations, because like it, or not, reddit is a big part of getting informations or opinions etc.

Ah yes, working as intended. It's probably affecting people more than reddit themselves. Hope the content draught continues though.

I've actively found this as well but honestly, I think it's for the best because most of the time Reddit posts with actual answers aren't well-cited. So if anyone asks how you know something, "uhh Reddit told me" is pretty weak. So Google is getting better because Reddit has gotten worse. It means that you have to go to the actual articles and find the actual sources instead of this daisy chain of information. We have a huge issue with misinformation and this actually helps resolve it.

Wait you use reddit posts to inform yourself on things where misinformation is possible? I also was mildy inconvenienced by the blackouts but it was mostly related to programming stuff, where it is very obvious if an answer is wrong. I don't think I would even consider using reddit as a source for anything factual

I work as a game developer and a programmer. There are a lot of possibility for people to be wrong. Specially when it comes to design or usage. A lot of misinformation in programming is like yeah this answer technically since this specific case but when you scale it, it breaks entirely. Like https://forums.unrealengine.com/t/stealth-based-mechanics/6992/6 is a great example where yeah a trace will work, your data will be inaccurate a bit, you won't be able to scale it and it won't work with a lot of edge case lighting. The better solution is to use a grey colored mesh and a scene capture to get information consistently about both the baked and dynamic lighting. You might even have a better way though like getting the data from lumen or shadow maps.

So even with things you think won't have misinformation, you get misinformation and people guessing while presenting they are right.

All the stuff i would use reddit as an actual source for is things where it's either obvious that the person is wrong or easy to check or think through. Same for lemmy

Yeah I mostly use it for like product reviews/recommendations or like personal help topics. Not stuff where factual information is required

We have a huge issue with misinformation and this actually helps resolve it.

I'm not really sure about that. Bad SEO is something that still exists, and with huge sites like Reddit gone, the bad SEO sites become more prominent which is not necessarily the site with actual articles and sources.

Of course the solution to this is not reddit back but stopping SEO and having better curation of sites in search engines somehow.

Makes me want to go back and edit my posts to f*** /u/spez because I don't want them getting traffic off of my content. But also don't want that entire collection of human data gone if everyone did the same.

Too bad we can't all export and reconstruct our conversations here somehow.

My posts are 99% shitposts anyway, so it doesn't really matter, nothing constructive to mankind.

Do it. Use "Power Delete Suite", it has an option to edit comments before deleting everything.

Use a tool to edit all your comments to a Lorem ipsum, the more useless data they have filling their database the better, I prefer this to simply deleting them all and freeing up their database storage.

Btw, I don't know any tool for that, but I guess there should be some because I saw some users editing all their comments.

That’s why I used shreddit to delete all my posts and comments on Reddit. It’s not much, but if everyone does it Reddit will feel the repercussions. They won’t benefit from my content anymore.

I don't want to take away ressources from people who will look into Monero in the future :/

In the past I commented many explanations when people asked for help and I don't want someone to find a thread with a question and deleted comment with a "Thanks!" reply. I guess a script to change all my past comments into something along the lines of "Removed. In case this was a support-related comment, feel free to ask for help on monero.town" could work?

Some people used a script that edited all their comments to forward to a new instance (in this case it could forward to Lemmy). Perhaps that would be a solution?

I'm considering switching to Kagi because of this. Its results are impressive.

This is the first I've heard of Kagi, how does it compare to duckduckgo?

I was amped for Kagi when I first heard about it. But they bumped the price up after the LMM boom. Still might have to bite the bullet as part of desire to use paid ad-free services.

Depending on how much you use it, it might not be that much worse though... The old price was 10$/mo for unlimited searches. Now they offer different tiers starting at 5$/mo for 300 searches.

Personally, I use about 300-500 searches per month, so my monthly bill is actually less than it used to be (5-7$).

The cost is why I'm probably not going to plug it into my Searx install.

I switched and I'm happy. I rarely use the !g shebang to see if google has anything more useful and it rarely has.

I've been tinkering with it a bit. It's okay so far. For work stuff it's been somewhat helpful (though the problems I'm solving appear to have nothing to do with the code I was debugging). Considering getting a personal subscription to kick the tires for a month or so.

Seems nice based on my trial but they are really pushing the envelope on my price tolerance.

5/mo is too much I think.

I'm not sure what to think about the price. I can't really imagine life without a search engine, even though I was alive for a couple of decades before search engines existed. I pay $400/month for my car, but my search engine arguably gives me more value (I am lucky not to need to drive a lot). I wouldn't pay $400/month for a search engine. But $5-10 to have a degree of freedom from the tracking and results that aren't just trying to get my money? I am intrigued.

Idk, maybe it should be usage based. I feel like 60/yr is too much. I'd be fine with 19.99/yr but idk what they're costs are. Otherwise, I do like the idea. Confession, I haven't used it yet but I plan to signup and try the first 100 searches free.

I just add “forum” to the back of my search

If you've ever owned an older car, you know that this is the absolute best approach.

Good luck getting exceptionally niche advice for things like that on reddit. Forums get so much more specific, you get an entire forum dedicated to one car model that was only built for 5 years and a bunch of people there know literally everything about it, like the fact that you're better off getting an aftermarket PCV valve because those are built a bit better and don't fail early, or the fact that the shifter cable has the tendency to get water in it so you better be careful shifting out of park on a really cold morning, you might just snap the cable if it's old.

Yeah the problem there is VerticalScope has a hidden hand in all those pies and draconian monetization policies.

Are lemmy instances indexed properly as well? Would it be enough to put "lemmy" into the search

The federated nature of instances unfortunately might nerf the SEO because they're from different domains. Google wouldn't value instance_1. com more because the clicks to related_instance_2. com are higher.

I'd imagine if/when the fediverse becomes popular, search engines will account for this.

I thought links between domains helped pagerank score? Mind you, it's been a while since I learned SEO. A lot of the content, especially the federated stuff, seems to be loaded via javascript. I wonder if that affects what can be indexed.

Theres more to it than that, vut it does help. However, the base issue here I think is that they just don't crawl the federated space yet.

You can prepend a link with "cache:" to view Google's cached version of the site. This works automatically with the url bar in at least Firefox and Chrome (likely other browsers as well). If your browser doesn't support that you can enter it in the google search bar and the result will be the cached version of the site (if available)

I've made a bad habit of attaching the word "reddit" to the end of too many of my searches even for questions that I should be looking for their answers in trusted sources instead of taking answers from random redditors the blackout has helped a little with avoiding that.

So... How is Lemmy set for SEO?

SEO as a concept needs to die. dont get me wrong i want Lemmy to show up in google results but doing that by spamming keywords and unrelated “related” posits is not the move

Then what is the correct move? How do I locate content related to keywords?

wish i knew, maybe something with AI?

the reality is that current-day SEO isn't even Optimization, it's Lying. try googling a recipe or an alternative to a popular software and see how the first four pages are all ad-ridden, useless spam-bot articles designed to retain users rather than give them the information they are looking for

recipe

based.cooking

alternatives

alternativeto.net

exactly my point--neither of these (excellent) sites will show up in your standard google search because they have the integrity not to abuse SEO

It is - but you can still access via archive.org and similar resources.

Doesn't help for searches though

You can copy the address of the search result into the way back machine or Google cache

You're absolutely right, true, but that will work for you and me, but not for your typical user, even the more advanced ones will be stumped at that point

Google should just redirect to the archived page if the link to Reddit is dead.

It's also a super clunky way to search. If I'm skimming posts for technical issues that I need a quick turnaround for, I'm probably not going through that hassle unless I'm desperate.

I've definitely felt like my Google searches have been lackluster after a lot of subreddits went dark. from advices to game communities, it sucks to check other forums you have no knowledge of browsing or worse shudders quora

100% has this happen today. Wanted and answer, the only answer was on Reddit, and the Google link was busted.

kinda think we need a search engine that can index fedirated sites . like lemmy /mastedon /pleroma .etc .etc

searching for help with technical /specific things has become a nightmare .as al the usefull subreddits have gone dark due to the ongoing protest . making google not so helpfull at all to use

Didn't notice since I use Kagi...

I did notice that Kagi now informs us about how much tracking and shit the sites are using. It's an info badge for each url.

Never heard about Kagi before, thanks for mentioning it! How is your experience with it? I tried DuckDuckGo for a while and wasn't to happy about it. Is it comparable?

Much better. With ddg I was using !g all the time and it wasn't finding a lot of things. It got very frustrating.

With Kagi, I started using it and I never switched back to Google. I haven't used Google search for six months. It's amazing. Absolutely go try it!

I was looking up traefik labels for my new Lemmy docker-compose setup and had to reference reddit.