The New York Times tried to block the Internet Archive: another reason to value the latter

psychothumbs@lemmy.world to Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ@lemmy.dbzer0.com – 1152 points –
The New York Times tried to block the Internet Archive: another reason to value the latter
walledculture.org
68

If I controlled a paper, I’d force a git control system with publicly viewable edits made after publication.

Imagine the goodwill and trust that would instill in the public toward your paper.

Edit: I’ve thought the same thing about proposed legislation for a long time.

I think many have also been wondering about version control of legislation/law documents for some time as well. But I never understand why it’s not realized yet.

Because the people who would implement that system would be the same people it would hold to account.

Same reason insider trading laws don't affect congress members...

Probably because companies don't want to be held accountable.

a git control system

And be able to actually be held responsible for your actions? You're crazy!

I'm genuinely impressed by this being upvoted here. Big tech and powerful corporate/government interests are destroying our societies. This information needs to be checked and tracked.

I assumed the piracy sub would be a safe space for this sort of thing

Piracy is data preservation after all. How many books, series, TV shows, and video games would be inaccessible if not for pirated copies?

Definitely can't rely on companies to archive their own stuff effectively.

This article sent my down a Brewster Kahle rabbit hole, so...

Who remembers when Alexa was simply a web traffic rating site? I forgot that Amazon named it's assistant after that property.

holy shit

How have I never connected those dots?

Alexa used to be a great marketing tool before Amazon got their grubby meathooks on it.

This sounds like a great excuse to launch an archive with a bunch of proxies that automatically captures new New York Times articles and tracks changes over an exponential amount of time. Preferably with a built-in algorithm that diffs the articles.

When I tried to open this article about the importance of allowing bots to archive content, I got this "Robot Challenge Screen":

😭

Do you support war, state propaganda and policing of speech or do you support things like freedom of information, speech and the internet archive? You can't do both, fake progressives.

I'm all for taking molotovs and whatever else we can manage to scrounge up to bring the heat to any company who opposes the Internet Archive. I'm willing to perform terroristic acts to show these people that we care about our digital freedom.

this article is 7 years old lol

It came out yesterday. You are probably looking at the date on the screenshot of an article that it starts with rather than the date of this article at the top.

This is useful for pointing out if a news site is manipulating a narrative, but for other things, I think news site should get the privacy they need to make stealth edits.

Like:

More recently, the Times stealth-edited an article that originally listed “death” as one of six ways “you can still cancel your federal student loan debt.” Following the edit, the “death” section title was changed to a more opaque heading of “debt won’t carry on.”

This was just poor wording. No reason sites shouldn't have the peace of mind to change poor wording without being called out.

...... What? No, if you need to edit poor wording you add a note establishing that the editor missed a section of poor wording, and that section has been revised.

You want to do stealth edits? We call those first drafts, and they arent published. Want to hide your edit history? Edit before you post.

People can make mistakes and miss things you know.

And there is nothing wrong with that, nor is there anything wrong in admitting your mistakes

Nothing wrong with admitting your mistakes, but also seems to me that you should be able to fix them without publicly announcing it.

Not in the news world. Corrections need to be made so people don’t go around spewing nonsense.

EDIT: And those corrections need to be bold and assert themselves. You can’t simply change your words and expect people to find the corrections themselves. That is too much work for the reader, and stating corrections is VERY easy for the publisher.

This. My national news agency publishes corrections like in ye olden days with ye olde telex: separate issue

example would be:

CORRECTION - President denounces war in Israel

BULLETIN - President denounces war in Isral

listed separately, added in their own archives etc.

also seems to me that you should be able to fix them without publicly announcing it.

You would seem to be wrong then lol. News has standards higher than Uncle Joe's Truckin' Blog™ or someone's Aunt's Facebook post.

There is no whiteout.

You will strike thru the error using a single line, leaving the error legible. Then amend the document with the valid information and initial the change as authorized.

You then submit the new draft, with visible corrections, to be published.

That's exactly how they taught us in nursing school. If you try and hide the mistake by "scratching" it out, it's assumed that you're hiding something. A single strike thru with an initial; owning your mistake. Mistakes are expected, and so is being honest about it. Makes you think twice before writing anything half-assed

Granted, most of us don't do paper-charting anymore; but the EMR still tracks any addendum. Don't go writing bullshit that you're unable to explain

Same with engineering type courses too. And all my science labs. And any contracts job forms etc. I'm constantly trying to get apprentices to break the habit of scratching things out. We dont destroy information. What if you were wrong about being wrong? And write units for things and not just numbers dammit.

Its the New York Times not someones personal blog. If they are publishing sloppy work that is their fault.

You should ad an edit to this comment:
Like this:

Edit:

People can make mistakes and miss things you know.

This is an example where I am objectively wrong and I apologize.

They wouldn't be called out if they had left editorial notes, that is what the article is about.

Horseshit. If your editor doesn't catch the article that says "have the peasants considered suicide as a way out of debt bondage?" then you as a news outlet should absolutely have to live with what you published.

But how do you determine what's just 'fixing poor wording' and what's actively hiding major bias or retcons of history?

Radio NZ got caught a year or so ago with a staffer who was editing articles syndicated from Reuters to be more pro-Russian. Should they be able to sweep that under the rug and claim it was only ever the one article they got caught on?

Likewise, bin Laden was originally hailed as an anti-Soviet freedom fighter. The articles relating to that are part of the historical record and kinda important.

Allowing the historical record to be retconned with impunity was probably the defining trait of 1984. It's really not a path you want to go down.

You don't and there's no good way to reconcile my two opinions. I don't disagree the archive should exist, I'm just saying, manipulating information is a valid reason, but the author's bullying publishers for mistakes isn't.

Acknowledging literally every change after any news content is published in any context isn't bullying anyone.

It's the absolute bare minimum to not be a piece of shit.

There's an easy way to reconcile them... The opinions are "articles should be backed up to prevent information manipulation, a threat to democracy" and "they should be able to hide their mistakes so they don't get made fun of"

You reconcile them by not letting them stealth edit, and you stand up for them when they made an honest mistake and are being blasted for it

While I agree in theory, it's hard practically to give the ability to make private wording and typo edits without giving the ability to make more insidious changes - like pushing a certain narrative and then quietly changing words here and there to erase evidence of that after most people have read it, etc.

If news websites kept their own visible audit trail, much like Wikipedia, I could see the argument that Internet Archive doesn't need to capture these articles immediately, maybe it should be time bound to a year after publication or somesuch, and therefore recent news could retain its paywall by the NYT without being sidestepped by Internet Archive. (While it's annoying that articles are paywalled, news sites do need to make money and pay for actual news reporters.)

Yeah I'm surprised the archive hasn't worked out a deal with publishers simply to delay showing articles.

It exists, it's called a robots.txt file that the developers can put into place, and then bots like the webarchive crawler will ignore the content.

And therein lies the issue: if you place a robots.txt out for the content, all bots will ignore the content, including search engine indexers.

So huge publishers want it both ways, they want to be indexed, but they don't want the content to be archived.

If the NYT is serious about not wanting to have their content on the webarchive but still want humans to see it, the solution is simple: Put that content behind a login! But the NYT doesn't want to do that, since then they'll lose out on the ad revenue of having regular people load their website.

I think in the case of the article here though, the motivation is a bit more nefarious, in that the NYT et al simply don't want to be held accountable. So there's a choice to be had for them, either retain the privilege of being regarded as serious journalism, or act like a bunch of hacks that can't be relied upon.

It exists, it's called a robots.txt file that the developers can put into place, and then bots like the webarchive crawler will ignore the content.

the internet archive doesn't respect robots.txt:

Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes.

the only way to stay out of the internet archive is to follow the process they created and hope they agree to remove you. or firewall them.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/

When a news provider publishes something they should be able to be held to what they’ve said. That’s the nature of both publication and the responsibility that the press should be held to

Editing news should require by law an editors note at the bottom what was changed to what like a github commit.

If you cite that shit literally somewhere you could get in trouble for citing wrongly.

At the top.

A note at the top, that there were changes made and an auto scroll link to the foot note of changes.

This is actually a perfect example of why we need to archive these things. Don't let corporations try to rewrite history wtf

I don't care how many times you edit your comment, but I also don't trust you at all. Now, I don't have to trust you because clearly I am not going to learn anything of value from you.

If you don't care whether I trust you or not, this shouldn't bother you.

Most Newspapers trade on their credibility. They should want to be trusted that they aren't making material changes to their articles. Are you suggesting we leave it to them to decide for themselves what constitutes a material change?

Most Newspapers trade on their credibility

BWAAAAAAHAHAHA!

It's sad but it's true. People out there really believe newspapers as if they were sacred texts.

So, how much nuance do you think exists between the idea of "credible" and the idea of "sacred texts"?

You'd be a great employee of the Ministry of Truth

‼️‼️HOLY FUCKING SHIT‼️‼️‼️‼️ IS THAT A MOTHERFUCKING 1984 REFERENCE??????!!!!!!!!!!11!1!1!1!1!1!1! 😱😱😱😱😱😱😱 1984 IS THE BEST FUCKING NOVEL 🔥🔥🔥🔥💯💯💯💯 O'BRIEN IS SO BASED 😎😎😎😎😎😎😎👊👊👊👊👊 DOUBLEPLUSGOOD DOUBLEPLUSGOOD DOUBLEPLUSGOOD DOUBLEPLUSGOOD DOUBLEPLUSGOOD DOUBLEPLUSGOOD DOUBLEPLUSGOOD 😩😩😩😩😩😩😩😩 😩😩😩😩 2+2=5 2+2=5 2+2=5 2+2=5 2+2=5 2+2=5 2+2=5 2+2=5 2+2=5 2+2=5 2+2=5 2+2=5 2+2=5 🤬😡🤬😡🤬😡🤬🤬😡🤬🤬😡 WE WERE ALWAYS AT WAR WITH EURASIA 🇷🇺🇷🇺🇷🇺 🇷🇺🇷🇺🇷🇺‼️‼️‼️‼️‼️‼️ Hey Winston ❓❓❓❓❓❓❓ I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 I love Big Brother🗿 🗿 WAR IS PEACE 😎😎😎😎😎😎😎👊👊 FREEDOM IS SLAVERY 😎😎😎😎😎😎😎👊👊 IGNORANCE IS STRENGTH 😎😎😎😎😎😎😎👊👊 Big Brother is watching you❓❓❓❓❓❓❓❓❓❓ Miniluv Room 101 😱😱😱😱😱😱😱😱😱😱😱😱 WE WERE ALWAYS AT WAR WITH EASTASIA 🇷🇺🇷🇺🇷🇺 🇷🇺🇷🇺🇷🇺‼️‼️‼️‼️‼️‼️ Big Brother is always right😂🤣😂🤣😂🤣😂😂😂🤣🤣🤣😂😂😂 r/politicalcompassmemes r/unexpectedpcm r/expectedpcm perfectly balanced as all things should be r/unexpectedthanos r/expectedthanos for balance

WE WERE ALWAYS AT WAR WITH EURASIA

Oh hey, there's the relevant bit. This is literally what you're suggesting.