Internet Archive is continuing to face DDoS attacks after several days, says “this attack has been sustained, impactful, targeted, adaptive, and importantly, mean”

ForgottenFlux@lemmy.world to Technology@lemmy.world – 1658 points –
Internet Archive is continuing to face DDoS attacks after several days
neowin.net
187

You gotta be a special kind of sad to DDoS archive.org...

Foreign government, moneyed interests, or domestic dipshits, taking all bets.

Barnes & Nobel going rouge.

Really? I thought they were more of a chartreuse myself...

Someone facing an enormous lawsuit who realized their tweets / claims were accessible and needs to buy time for their legal team.

-me a day or two ago

who was trying to sue it out of existence recently? probably them.

Wasn't that Pearson or some other shitty "educational" book publisher?

Domestic roscomnadzor paid by China orchestrated by USA. Or paid by USA and orchestrated by China. Either one.

Maybe a rogue entity trying to anonymously use it to train an AI or LLM. Either for its data or to learn how to more effectively attack.

Across social, economic, and political spectra, you can always tell the good guys from the bad guys by their stance on access to knowledge.

Had an argument with FIL where he argued his last child Is out of school so he votes against school taxes. I'm like you know that pays for the people you and your family will interact with. His response was "I want them as ignorant as me". Even as joke it's lacks wisdom. He just complained about doctors being uneducated an hour before.

Ffffuck that's depressing.

I don't even have kids. I'm actually pretty against having them in general. But education is an existential requirement to a functioning democracy, and even a basic education is so broadening.

The only reason to want people ignorant is if you're trying to swindle them, which honestly benefits no one in the long run.

Not even democracy per se; it's a basic requirement for a society that functions at more than a medieval level.

Complains without solutions and distrusts legitimate experts, with a dash of “fuck other people.” So you’re just saying your FIL is a typical Republican.

Losing the internet archive would be such a huge loss... I really hope they have a backup plan in case things go bad legally.

yeah, it's definitely going to be one of the most important things to have ever happened in human history, if it does.

Library of Alexandria burning down for the modern era

Damn. I hadn’t even thought of it. Isn’t it crazy that some people among us would see things like that burn and not even wince. Hell, some would even celebrate. Our lives are so short. It blows my mind that anyone would want to destroy something like that for any reason.

All of the files on the archive have torrent's available. If they just release all of the torrent files or their URL's, people can start seeding and downloading them. It would be a lot of data though.

A quick search indicates that they’ve archived ~100PB of data.

Now I’m trying to come up with a way to archive the internet archive in a peer-to-peer/federated fashion while maintaining fidelity as much as possible…

That’s what IPFS is for. It’s ideal for that kind of stuff

Can DDOS attacks actually erase/corrupt stored data though? There’s no way they’re running all of this on a single server, with hundreds of PB’s worth of storage, right?

No. It affects availability. Not integrity or confidentiality.

DDOS attacks block connection to the servers, they don't actually harm the data itself. You could probably overload a server to the point of it shutting down, which might affect data in transit, but data at rest usually wouldn't be harmed in any way; unless through some freak accident a server crash would render a drive unusable. But even then, servers are usually fully redundant, and have RAID systems in place that mirror the data, so kind of a dual redundancy. Plus actual backups on top of that; though with that amount of data they might have a priority system in place and not everything is fully backed up.

From what I've learned, it is possible to create a vulnerability within the system of a ddos attack would overload and cause a reset or fault. At that point, it's possible to inject code and initiate a breach or takeover.

I can't find the documentation on it so... Take it with a grain of salt. I thought I learned about it in college. Unsure.

Torrent?

That wouldn't distribute the load of storing it though. Anyone on the torrent would need to set aside 100PBs of storage for it, which is clearly never going to happen.

You'd want a federated (or otherwise distributed) storage scheme where thousands of people could each contribute a smaller portion of storage, while also being accessible to any federated client. 100,000 clients each contributing 1TB of storage would be enough to get you one copy of the full data set with no redundancy. Ideally you'd have more than that so that a single node going down doesn't mean permanent data loss.

Not sure you'd be able to find 100k people to host a 1TB server though. Plus, redundancy would be better anyway since it would provide more download avenues in case some node is slow or has gone down.

Yes, it's a big ask, because it's a lot of data. Any distributed solution will require either a large number of people or a huge commitment of storage capacity. Both 100,000 people and 1TB per node is a lot to ask for, but that's basically the minimum viable level for that much data. Ten million people each committing 50GB would be great, and offer sufficient redundancy that you could lose 80% of the nodes before losing data, but that's not a realistic number to expect to participate.

That wouldn't distribute the load of storing it though. Anyone on the torrent would need to set aside 100PBs of storage for it, which is clearly never going to happen.

Torrents are designed for incomplete storage of data. You can store and verify few chunks without any problem.

You'd want a federated (or otherwise distributed) storage scheme where thousands of people could each contribute a smaller portion of storage, while also being accessible to any federated client.

Torrents. You may not have entirety of data, but you can request what you need from swarm. The only limitation is you need to know in which chunk data you need.

Ideally you'd have more than that so that a single node going down doesn't mean permanent data loss.

True.

True. Until you responded I actually completely forgot that you can selectively download torrents. Would be nice to not have to manually manage that at the user level though.

Some kind of bespoke torrent client that managed it under the hood could probably work without having to invent your own peer-to-peer protocol for it. I wonder how long it would take to compute the torrent hash values for 100PB of data? :D

~300MB/s on one core of 13-years old i5 SHA-256(used in BitTorrent v2). Newer cores can about half a gig per one. Less than 3 days on one core then. Less than day on 3 cores.*

* assuming no additional performance penalty for increased power consumption and memory bandwith usage

My guess storage bandwidth would be biggest bottleneck.

Found relatively old article(in Russian, just search for openssl and look at graph that mentions SHA-512 which is SHA-2 too) that says i7-2500 all-cores throughput is slightly over 1GB/s.

It’d be a lot more complicated than that, I think, if one wanted to effectively be able to address it like a file system, as well as holistically verify the integrity of the data and preventing unintentional and unwanted tampering

as well as holistically verify the integrity of the data and preventing unintentional and unwanted tampering

Torrents. Their hashes are derived from hashes of chunks. Just verify chunks.

if one wanted to effectively be able to address it like a file system

https://github.com/johang/btfs

if you have a spare corner in your server, host the archive warrior and help them out.

Thanks, I’ll try to use it from title to time.

It's archive team, not archive.org. Both are good anyway.

1 more...
1 more...

That last sentence though...

  • **"The cyberattacks share the timeline with the legal battle Internet Archive is facing from US book publishers, claiming copyright infringement and seeking combined damages of hundreds of millions of dollars from all libraries." ** *

i wonder why print is dead

How is print books dead ?

https://www.statista.com/chart/24709/e-book-and-printed-book-penetration/

And that's only units, in terms of revenue, ebooks is still pocket change in comparison.

i wasn't speaking in comparison to ebooks. ebooks suck in every way imaginable.

What other long-form text format has beaten print books ?

why are you coming up with these categories? "print is dead" doesn't mean "because there's print 2.0 now"

—radio is dead
—excuse me, but internet radio is nothing compared to am stations
—yeah, obviously people who don't listen to radio don't want to listen to radio with extra steps
—what other forms of radio has beaten radio?

what are you even

I am trying to understand what's the argument behind your statement. I mean, there are more books being published than ever and there are more readers than ever. So, I fail to imagine how are books dead. That's why I am asking these questions.

The argument is that no one reads books anymore. Most media consumed today is in modern video and audio formats like YouTube and podcasts. You shouldn't compare paper books to ebooks, you should compare them to views on YouTube.

YouTube is video, it replaced TV. Podcasts and music streaming replaced the radio. Why should I compare books to another medium? In fact, back in the TV and radio era, more people consumed thant kimd of media instead of books, and that stays true today, yes. More people watch youtube than read books. I bet more people play games than read a book. But it's comparing different kinds of media. It would be like saying podcasts are dead because more people consume pictures and video on instagram.

you're wrong. TV replaced the radio, not podcasts. we're not comparing different kinds of media, we're saying new media replaces the old, regardless of form. it's not about numbers; it's about migration. if people moved on from listening to podcasts to consume pictures and video on Instagram, then you could totally say that, but they didn't, so we don't.

The Internet Archive needs to be distributed somehow. We can't have a single point of failure like this or we've learned nothing since Alexandria.

I've got several terabytes just laying around that I'd happily devote to ancient copies of web pages.

As of January 2024, archive.org claims to have over 99 Petabytes of data stored.

This is why we need more websites to adopt secure client side scripting.

JavaScript may or may not be it, but the web needs to be reachable/archivable. It should also have attribution, but that’s a tangent.

1 more...

They could do this with the bank of america instead

Or AP? Nobody gets payed and so they get more attention!

Banks are evil, nonprofits like archive.org are not.

Lolwut?

Describing a high intensity DDOS attack on one of the world's most important resources as simply "mean" is unironically one of the funniest things I've read this year.

Hope they get some support soon.

Do they have any idea who’s perpetrating the attack?

If i go into conspiracy mode i would say record labels (they tent to have small peepees when it comes to, well everything) or some DICKtator country that doesnt like archived text of some sort.

1 more...

If it's an entity, my money would be on China just discovering it exists since it diametrically opposes its propaganda machine. But it could very well just be dark web shitheads whose seasonal drug binge just spiked up again, plenty of them to go around to make accusations and propaganda they know are false whom can't simply backtrack it because of archive.org and it doesn't require much to disrupt a still too largely implicit trust driven Internet.

Wasn’t there some controversy involving Internet Archive just recently?

Whoever’s behind this is trying to get rid of the fact that Internet Archive creates memory of the internet’s contents. Somebody wants to be able to control what people see on the internet.

Heck it could be Google doing it, since that would be in line with their recent push to change the way search works. Both of those act as components of a larger drive to control what people see and hear.

i honestly really hope this shit gets taken care of so internet archive can still keep going

I'm not good with computers and stuff. If somebody finds these scumbags who are ddos'ing internet archive I'd be very grateful. Also fucking them up in the process is also good.

people are shitty

when you enshittify
facebook looks ugly
when you’re a drone

women seem wicked
when you’re a want ad
default instructions … so unclear
when you’re down

when you’re AI
prompts just appear in your brain
as AI
humans are nothing but pain
as AI
as AI
when you’re A-A-A-I

Well Google search method was just leaked... Wonder if this picked that up before they pulled it.

Can you tell me more about this? Or just a link would be amazing.

I'm worried about what this could mean about further SEO enshittification.

1 more...
1 more...

Is it possible that someone is conducting some operation and doesn’t want it to be randomly documented?

Some state maybe? Eh I just have a hard time thinking of motives for this attack

I really doubt ddos is affecting whatever crawling internet archive does, just blocking the public from viewing the website.

If this party is benefiting from a temporary outage of the IA, then that means their exposure window is temporary. That makes me think they’re doing something where the evidence will appear on some website temporarily, but not permanently. Don’t know what that might be, but that would be the profile of a thing which would benefit from DDoSing the IA.

The alternative is they’re trying to kill IA permanently. Enough time of its having zero utility to the world will eventually kill it. Could take years though.

Could be a rogue AI. It is a strange thing to see.

But generally speaking, I don’t feel confused when I see beautiful things attacked. I’ve seen a lot of things get attacked because they’re beautiful and useful, and it doesn’t surprise me any more.

There is no way a DDoS on the website in affecting the crawler. Also, running a DDoS attack of this size costs a lot of money (if you rent the network, if you own it it costs money as lost sales). No one is giving AI control over a DDoS network to just fuck around.

The way it breaks utility is in the inability to read from the service. If that goes away for long enough, the Archive will die.

It would be crazy expensive to run an attack of this size for years.

Sure but usually those who attack pretty things for no reason are morons barely able to articulate themselves let alone coordinate a massive DDoS.

Wrong. Intelligent, competent people attack beautiful things.

There is highly organized evil in the world. People who aren’t just trying to win. They’re trying to make people lose.

Capitalists don't like libraries because it means open access to resources which reduces the market size.

Can someone explain why they're not able to protect against this? Couldn't they put request limits or monitor for spikes and banning these attempts?

Without knowing how, not really. If it's a massive multi-device botnet, like Mirai, for example, that's millions of indvidual devices across millions of addresses, so it isn't so simple as just blocking a domain. Trying to block all of them might well just block legitimate users.

Request limits also wouldn't work if it's millions of devices making a few requests at once, and an overall limit would have a similar locking-out effect as blocking everything. Especially if the DDoS is taking up most/all of that limit.

Just so crazy to me the scale.

Is there any range for how many "a few requests" would be needed to ddos a site like this?

Alright let's put in our bets.

I've got $50 on JIDF behind the DDoS attack.

Go offline a couple of days until they are losing interest in DDOS'ing? Would that work?

That just means the DDOSer is taking Internet Archive down without any further work required.

True. That's not something you want. Could use that downtime for extensive maintenance to roll out a more robust system (they are probably even working on that already in the background). For the end user it doesn't really make a difference if down because of DDOS or because of maintenance I thought.

Couldn't they just use cloudflare or something?

Maybe temporarily switch to a different address? And leave fake addresses to catch the ddos. Then just keep changing addresses using an IPFS system to front-end the new address?

There's no way to do this and let visitors know what the new addresses are, without also giving the new addresses to the attackers.

IPFS is a real solution though

Lol, no, the Blockchain has never been a "real solution", and it never will be.

How is anyone still on the Web3 hype train?

IPFS is not built on a blockchain

Yeah, it's just a modern peer-to-peer content distribution network

Can someone eli5 to me why it’s hard to track down these dipshits ? Even if it’s a distributed attack, picking a single IP and doing a lookup for the domain name and checking with the registrar might actually reveal their identity right ? Of course I’m guessing law enforcement needs to be involved to force registrars to give up that info if it’s not publicly available? Are there laws that say a ddos is illegal ?

There is no domain name associated with the IPs.

Most importantly, usually, DDoS attacks use infected devices (PCs, mobile phones, smart fridges, shady browser addons etc...) to get many ip addresses and devices/locations and attack from everywhere at once.

DDoS attacks are performed by botnets. What is a botnet? Well, you know about viruses etc, right? Your PC gets infected and it becomes a part of the botnet. Now police do the investigation, they look up IPs and they see YOUR IP and come to YOUR house. See what the problem is?

And, frankly, your PC doesn't even have to be infected to become a part of an attack. There are plenty of hacked web sites, which still look like nothing has changed, but they will contain a hidden JavaScript code which will force your browser to flood the victim. Again, the police will only find YOU.

It might be Trump's squad trying to make it so that his trial outcome can't get into the archive

Court documents are already open record and stored indefinitely. Internet archive wouldn't be needed for that.