Please check you kids' Halloween candy, everyone

Masimatutu@lemm.ee to Lemmy Shitpost@lemmy.world – 1338 points –
84

You are viewing a single comment

UTF-8 and ASCII are normally already 1 character per byte. With great file compression, you could probably reach 2 characters per byte, or one every 4 bits. One character every bit is probably impossible. Maybe with some sort of AI file compression, using an AI's knowledge of the English language to predict the message.

Edit: Wow, apparently that already exists, and it can achieve even higher of a compression ratio, almost 10:1! (with 1gb of UTF-8 (8 bit) text from Wikipedia) bellard.org/nncp/

If an average book has 70k 5 character words, this could compress it to around 303 kb, meaning you could fit 1.6 million books in 64 gb.

You can get a 2tb ssd for around $70. With this compression scheme you could fit 52 million books on it.

I'm not sure if I've interpreted the speed data right, but It looks like it would take around a minute to decode each book on a 3090. It would take about a year to encode all of the books on the 2tb ssd if you used 50 a100s (~$9000 each). You could also use 100 3090s to achieve around the same speed (~$1000 each)

52 million books is around the number of books written in the past 20 years, worldwide. All stored for $70 (+$100k of graphics cards)

There's something comical about the low low price of $70 (+$100k of graphics cards) still leaving out the year of time it will take.

Well I guess you could sacrifice a portion for an index system and just decode the one you're trying to read