“CSAM generated by AI is still CSAM,” DOJ says after rare arrest

jeffw@lemmy.worldmod to News@lemmy.world – 291 points –
“CSAM generated by AI is still CSAM,” DOJ says after rare arrest
arstechnica.com
216

You are viewing a single comment

Do a Google Image search for "child" or "teenager" or other such innocent terms, you'll find plenty of such.

I think you're underestimating just how well AI is able to learn basic concepts from images. A lot of people imagine these AIs as being some sort of collage machine that pastes together little chunks of existing images, but that's not what's going on under the hood of modern generative art AIs. They learn the underlying concepts and characteristics of what things are, and are able to remix them conceptually.

And conceptually, if I had never seen my cousin in the nude, I'd never know what young people look naked.

No that's not a concept, that's a fact. AI has seen inappropriate things, and it doesn't fully know the difference.

You can't blame the AI itself, but you can and should blame any and all users that have knowingly fed it bad data.

I don't believe you're fully arguing in good faith here.

I'm assuming you've seen a naked adult, and if you had never seen a naked young person, I don't believe for one second you would be unable to infer what a naked young person might look like. You might not know for certain, but your best guess would likely be very accurate.

Generative AI can absolutely make those same inferences, so it does not need inappropriate training material for it to generate it.

The AI knows what a young person looks like.
It knows what a clothed adult looks like.
It knows what an unclothed adult looks like.

An AI trained on 100% legal material could make that inappropriate inference without even trying.

Now, have all the popular AI models actually been trained on 100% legal material? I have no way of knowing that answer, but you're incorrect to assume that just because it can output inappropriate images, that absolutely 100% proves that data was also included in its training input. Edit: nevermind, it definitely has been trained on inappropriate material, but that doesn't disprove that it doesn't need to be.

Well how do you train an AI model of any set of information, without the risk of it confusing good information from bad info...?