Parts of Reddit are staying dark. Our search results may suffer for it.

hedge@beehaw.org to Technology@beehaw.org – 158 points –
archive.is

Like it or not, years of insight, experience and expertise live in Reddit threads. But accessing some of them just got harder.

72

This is how protests work. You inconvenience other people so that they pressure the target of the protests to give in to the protesters. Never understand why people from that country do not get this

Absolutely! A lot of people seem to think a protest is shooting yourself in the foot and complaining about it. No, a protest is causing a ruckus so that everyone - protestors or not - get frustrated with the target of the protest. The point is to screw up search results on Google. The point is to make the "front page of the internet" an empty shell.

I went on reddit briefly to see if anything I subscribe to is polling to extend their blackout. r/DCcomics had a poll filled to the brim with "stay open, I'm slightly inconvenienced!" comments. These guys have clearly never been a part of or needed to protest for their basic rights before.

The mainstream view has lost a lot of that spirit, but plenty of Americans go just as hard as the French. Our corporate media downplays or slants the perception of protestors to make them seem like a noisy misguided minority when all we're usually asking for is basic dignity.

Then the news media goes off and makes any anti-protest vehicular homicide a celebrity, and right wing nuts flock to their go fund me pages.

It's not that we're as bad as we look, mostly.

I think we need to start thinking about the hard work of moving a lot of that essential information from Reddit to open community wikis.

I would be down to help if such an effort becomes reality.

Me as well, we could automate a good chunk of it using web archive

So would I. Particularly for the "how do I solve this super specific problem?" troubleshooting posts. Those are treasures.

This has already been affecting me a but. Now I'm not complaining because I fully support it. But I've recently been looking up product suggestions, tech help, etc and many of the reddit links in the search results were private communities. I was like "oh so this is actually having an impact at least."

I actually wish more subs would stay dark, especially since the CEO was basically like "they'll get over it soon"

Yup, I've already been really frustrated by this... Google's search results are so useless, full of advertisements, blogspam, astroturfing, etc, the only way to read about genuine reviews and experiences about stuff is to add " reddit" to the end of my search queries.

I figured out yesterday that if you go to the Google cached version, you can still see old posts. If I try it on mobile, the cached option isn't there, but on my PC I can click the (...) next to the search result, and click the cached option. Trying to figure out how to do that with mobile.

I fully support the blackout and I am trying to keep my Reddit traffic to a minimum, but I was trying to figure out a technical problem yesterday and it was a huge pain to find anything useful. Way too much SEO crap to wade through.

Well that’s our fault for letting information get congregated in a centralized service to be fair. Any information that is stored without redundancy on a single service should be considered already lost.

The Fediverse doesn’t fix this by the way, as far as I know. The data can be accessed from other instances, but as I understand it the data still lives on the instance. The day an instance does, poof, all the information it contains goes away.

But! It makes it easier to make information redundant, by having an instance that automatically archives information for example.

We had a problem, many people knew that we had a problem but we did nothing to fix it. We have the same issue on StackOverflow or even GitHub, by the way (although the latter is a bit mitigated by people having local copies of the repositories for example). It will come bite us in the arse one day.

RIP to everything lost on Geocities.

It will never be possible to preserve all information forever, nor do we need to, but we could certainly do better than the usual thus far.

Hopefully those communities that choose to stay dark indefinitely will migrate at least some of their information to external platforms for non-reddit access.

I doubt they'd be able/go so far as to export all the threads, but I'm thinking that it'd be nice if the communities with robust and informative wikis would at least make those available elsewhere. Same with the Fediverse too; I feel like any compilation of information like a wiki ought to be hosted elsewhere for some form of redundancy if possible.

Migrating the knowledge is one part but it doesn’t fix the dead links in the search results from major search providers. And, unfortunately, that is a hard problem to solve because a static (or nearly static) page like a wiki on a niche website doesn’t necessarily get the same ranking in the indexer as a community on Reddit would.

Yeah that's true. The only hope at that point would be to copy the search result and plug it into the wayback machine and cross your fingers. If this keeps up, I wonder if the algorithms at Google et al. would start to de-prioritize reddit links over time.

that's why I hope that some subs go read-only. keeps the information that has been gathered over the last few years, while making it so people mostly don't interact with it in their feeds anymore

This would be the perfect balance. Prevent Reddit from monetizing the subs any further, but keep a record of all the information that was shared since the creation of that sub.

exactly. hurts reddit without affecting the community a whole lol.

Funnily enough, this would make my move to Lemmy/KBin easier.

I've been trying to compile a list of the subreddits I followed so I can find their Lemmy/KBin equivalents. But if a sub goes private (instead of read-only), it disappears from your subscribed list until it's re-opened.

And since I both subscribed to a ton of subs and had a terrible memory, I'm constantly worried that my list is incomplete.

It is unfortunate for sure. I've come across this issue already.
But that experience hasn't been great for a while amyway. Reading through comment chains is a nightmare on new desktop reddit. Looking forward to hopefully replacing 'reddit' with 'lemmy' in my search queries, hopefully sooner rather than later.

Yep, this just demonstrates how we shouldn't rely on one entity to be the arbiter of community information. It should get better over time.

Same here, everytime I ever saw reddit in my search results I'd audibly sigh. How they've managed to make their user experiences so extremely hostile is beyond me.

Good. They should stay closed. Sadly a lot of subs are capitulating and opening up again.

Fewer than I thought though, I would have thought the whole thing would have evaporated by now.

I'm seeing the amount of closed subs decrease every hour. This morning there were over 7000 still closed and now it's just 6510.

That's because America has woken up. Given that the blackout was planned for 2 days I'm actually quite encouraged that there are still 6500 after then. Even many of the ones that have opened have opened pending further discussion on the next steps. I'm not optimistic about the overall outcome for Reddit, but the more people that can be driven to alternatives the better.

It's only 6170 now :(. I'm losing hope.

Regardless if they come back or not, it doesn't mean you have to go back. :)

Not the worst and not the best reporting. I am surprised how many people apparently use reddit as a search engine given how many posts I saw in various subs that implied the poster never heard of a search engine given that there was another thread asking the same thing like 5 hours beforehand.

It is interesting they point out that Twitter style short form posts do not actually contain information people would be searching for. Also kind of sad that useful discussion is seen as ild fashioned and "modern" is short videos. I hate video results when I'm searching for something because if it even actually addresses the question it's 3-10 minutes of what is actually 2 sentences of answer. Such a waste of time.

@jmp242 @hedge I find that the users prefer textual search results the better training they get at searching. When things are being surfaced *for* them, they don't build the skills needed to evaluate search results and refine terms.

Maybe search results should link to archives rather than live urls

It's one extra step to put archive.is/ in front of any dark reddit url.

Yea this is definitely going to be a thing for tech questions especially. But to be fair we were always going to reckon with the issue sooner or later as long as a single private company is the sole owner of a site that ate all the specialized forums which would have previously housed such information. The best time to rip this bandaid off would have been before reddit was big, but there will be no better time then now.

I wonder if there is an import script that can migrate threads and comments over to Lemmy

There are import scripts - the problem is that Reddit has disabled the Pushshift API end of March, which makes data exporting significantly harder. There are some archives from before that available as torrent, and there has been effort from r/datahoarders to archive and submit it to archive.org before the shutdown.

I've been looking into that for our sub, and concluded it's currently not sensible - so in case we decide to reopen restricted for archive access I created a bot that re-posts Lemmy postings into the locked subreddit for discoverability, and adds a comment to drive users out for commenting.

It is the techical help that hurts the most. Raspberry Pi, Linx and Steam Deck are the big ones I find help for on Reddit.

But to be fair it just encourages me to search harder elsewhere, or better yet forces me to tinker more myself to find the solutions.

Regardless, it is a wealth of users that the users have given for free for so long.

I jsut click the cached google link. usually works except comments under a few levels wont expand

Is that really a thing? When I search for anything I very, very rarely see a Reddit link in the results. The last time I remember was a question about i3wm.

Reddit frequently pops up when I look for answers to tech questions.

Yeah I've noticed this a few times already. Cached pages have worked so far for me.

I often specifically used to search eg. site:reddit.com turboencabulator review or whatever, because on many topics just searching the entire web with Google will give you pages and pages of absolute garbage by sites that mainly exist to churn out (often stolen) low quality search keyword-laden content – especially on tech stuff – to generate ad views

I think they might mean folks who are using google to search reddit, for example, the search term "migrating to lemmy reddit". I know I append "reddit" to a lot of my google searches and now with the blackout, those search results will only take you to a private page for the subreddit.

I was trying to lookup stuff about mechanical keyboards on ddg earlier today, and several results on the first page were reddit threads. So it can definitely happen with some other hobbies/topics I imagine.

Reddit pops up all the time when I'm searching for questions I have about DIY projects around my house, as inevitably someone else has asked the exact same question in the DIY or HomeImprovement subreddits. Same for technology questions.

It's very useful compared to the usual SEO dreck that Google throws up as the top search results.

Kagi can filter out reddit automatically with it's lenses. You can do something sort of similar manually with -site:reddit.com in your query.

Another alternative is just using wayback machine to access reddit. That way they don't get your traffic!

Yeah it's been inconvenient to Google stuff and have private subreddits come up, but that's life. Hopefully that information will begin moving to Lemmy instances as time goes on.

Create a read only lemmy instance populated with the data from the Reddit data dump. Make sure Google indexes it. Comply with DMCA requests made by users who want their content removed.

The number of times I have received a this sub is private link over the past few days is super saddening.

I shouldn't be saddening, Reddit was a hellhole anyways, now we have a chance to start anew.

That’s the intended outcome of the protest! Glad it’s working.

Investing time and effort sharing know how and knowledge on a corporate social media was a mistake.

The Internet is intrinsically ephemeral. Data is always a few pulled wires away from going offline. Digital support lifespan is surprisingly short. Those aren’t stone slabs. Even paper lasts longer. The Internet’s strength is the distribution. For the data to endure, you need dedicated resources and individuals. Enthusiasts. Guardians. Professionals. If the responsible organization’s goal is profit, it’s doomed from the start.

I installed the ublacklist extension on chrome and immediately added reddit. There's nothing I can't find on other forums. Fuck reddit.

Your search results will recover when the information is reposted elsewhere. Not something to really get bent out of shape over.

Anything regarding pc building is f-u-c-k fucked rn. Trying to gets some finer info on bios settings yesterday was fucked lol

Can't the powers that be at reddit just flip a switch somewhere and remove the mods' ability to make subreddits private? Presumably if they could they would have done so by now (but if not, 🤫!!!)

If I remember correctly they did replace the mod teams for a few bigger subreddits and made these public again.

I presume they could but then those subs would be unmoderated which would be a huge legal risk.

This is going to the biggest blow to me tbh. So many nuanced problems have been found on Reddit. Hoping some of the data can be transferred elsewhere vs completely deleted but it isn't looking that way.

Most of the data has been preserved via the Pushshift data dump/archive. They seem to end at February 2023, and the entire archive (including the separate 2023-01 and 2023-02 archives) is >2TB with zstd compression so it's not exactly easy to search unless you have a few terabytes to spare. Luckily, much of the data still seems to exist.

https://academictorrents.com/browse.php?search=pushshift

https://the-eye.eu/redarcs/

Not unless we all opt out of the data collection like I did :) can’t train on me creepy AIs!