For Data-Guzzling AI Companies, the Internet Is Too Small

0nekoneko7@lemmy.world to Technology@lemmy.world – 36 points –
tech.slashdot.org
12

You are viewing a single comment

Idk, I find this hard to believe. I would think the challenge is more access to the information (gates, bandwidth), a speedy vault to store that information, and improving their models.

When you think about what's available on the internet, how much of human knowledge and propaganda is out there. With enough/deus ex tech, there's no way ai shouldn't be able to learn most of anything with the knowledge available, and the right trainers.

Yes, it's BS, like most of the AI takes here.

The kernel of truth is scaling laws:

[T]he Chinchilla scaling law for training Transformer language models suggests that when given an increased budget (in FLOPs), to achieve compute-optimal, the number of model parameters (N) and the number of tokens for training the model (D) should scale in approximately equal proportions.

and propaganda

Well, that's the rub, right? Garbage in, garbage out. For an LLM, the value is predicting the next token, but we've seen how racist current datasets can be. If you filter it, there's not as much lot of high quality data left.

So yes, we have a remarkable amount of (often wrong) information to pull from.

Mhm, I wonder when we'll have the resources to build one that can tell the truth from other lies. I suppose you have to learn to crawl before you learn to walk, but these things still having trouble rolling over.