“CSAM generated by AI is still CSAM,” DOJ says after rare arrest

jeffw@lemmy.worldmod to

News@lemmy.world – 291 points – 6 months ago

“CSAM generated by AI is still CSAM,” DOJ says after rare arrest

arstechnica.com

You are viewing a single comment

View all comments

Then we should be able to charge AI (the developers moreso) for the same disgusting crime, and shut AI down.

Camera-makers, too. And people who make pencils. Lock the whole lot up, the sickos.

Camera makers and pencil makers (and the users of those devices) aren't making massive server farms that spy on every drop of information they can get ahold of.

If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.

Now when that's the case, well where did the devs get the training data?.. 🤔

If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.

That's not how generative AI works. It's capable of creating images that include novel elements that weren't in the training set.

Go ahead and ask one to generate a bonkers image description that doesn't exist in its training data and there's a good chance it'll be able to make one for you. The classic example is an "avocado chair", which an early image generator was able to produce many plausible images of despite only having been trained on images of avocados and chairs. It understood the two general concepts and was able to figure out how to meld them into a common depiction.

Yes, I've tried similar silly things. I've asked AI to render an image of Mr. Bean hugging Pennywise the clown. And it delivered, something randomly silly looking, but still not far off base.

But when it comes to inappropriate material, well the AI shouldn't be able to generate any such thing in the first place, unless the developers have allowed it to train from inappropriate sources..

The trainers didn't train the image generator on images of Mr. Bean hugging Pennywise, and yet it's able to generate images of Mr. Bean hugging Pennywise. Yet you insist that it can't generate inappropriate images without having been specifically trained on inappropriate images? Why is that suddenly different?

The trainers taught it what Mr. Bean looks like and what Pennywise looks like - it took those concepts and combined them to create your image. To make CSAM it was, unfortunately, trained on CSAM https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse

3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model's capabilities.

Who is responsible then? Cuz the devs basically gotta let the AI go to town on many websites and documents for any sort of training set.

So you mean to say, you can't blame the developers, because they just made a tool (one that scrapes data from everywhere possible), can't blame the tool (don't mind that AI is scraping all your data), and can't blame the end users, because some dirty minded people search or post inappropriate things..?

So where's the blame go?

First, you need to figure out exactly what it is that the "blame" is for.

If the problem is the abuse of children, well, none of that actually happened in this case so there's no blame to begin with.

If the problem is possession of CSAM, then that's on the guy who generated them since they didn't exist at any point before then. The trainers wouldn't have needed to have any of that in the training set so if you want to blame them you're going to need to do a completely separate investigation into that, the ability of the AI to generate images like that doesn't prove anything.

If the problem is the creation of CSAM, then again, it's the guy who generated them.

If it's the provision of general-purpose art tools that were later used to create CSAM, then sure, the AI trainers are in trouble. As are the camera makers and the pencil makers, as I mentioned sarcastically in my first comment.

You obviously don't understand squat about AI.

AI only knows what has gone through it's training data, both from the developers and the end users.

Hell, back in 2003 I wrote an adaptive AI for optical character recognition (OCR). I designed it for English, but also with a crude ability to learn.

I could have taught that thing hieroglyphics if I wanted to. But AI will never generate things that it's never seen before.

Funny that AI has an easier time rendering inappropriate material than it does human hands..

You obviously don't understand squat about AI.

Ha.

AI only knows what has gone through it's training data, both from the developers and the end users.

Yes, and as I've said repeatedly, it's able to synthesize novel images from the things it has learned.

If you train an AI with pictures of green cars and pictures of red apples, it'll be able to figure out how to generate images of red cars and green apples for you.

Exactly. And if you ask it for the opposite of an older MILF, then how does it know what younger ladies look like?

It's possible to legally photograph young people. Completely ordinary legal photographs of young people exist, from which an AI can learn the concept of what a young person looks like.

The only example I can think of with what you said is just a couple brief innocent scenes from The Blue Lagoon.

Short of that, I don't know (nor care for any references to) any other legal public images or video of anything as such.

I dunno, I'm just bumfuzzled how AI, whether public or private, could have sufficient information to generate such things these days.

4 more...

Is an image of a child inappropriate? Fully clothed, nothing going on.

Is the image of an adult engaging in sexual activity inappropriate?

Based on those two concepts, it can generate inappropriate child sexual imagery.

You may have done OCR work a while ago, but that is not the same type of machine learning that goes into typical generative AI systems in the modern world. It very much seems as though you are profoundly misunderstanding how this technology operates if you think it can't generate a novel combination of previously trained concepts without a prior example.

I'm referring to the inappropriate photography and videos out there. Please learn to read.

6 more...

10 more...

....no

That'd be like outlawing hammers because someone figured out they make a great murder weapon.

Just because you can use a tool for crime, doesn't mean that tool was designed/intended for crime.

Sadly that's what most of the gun laws are designed about. Book banning and anti-abortion both are limiting tools because of what a small minority choose to do with the tool.

AI image generation shouldn't be considered in obscenity laws. His distribution or pornography to minor should be the issue, because not everyone stuck with that disease should be deprived tools that can be used to keep them away from hurting others.

Using AI images to increase charges should be wrong. A pedophile contacting and distributing pornography to children should be all that it takes to charge a person. This will just setup new precedent that is beyond the scope of the judiciary.

That’d be like outlawing hammers because someone figured out they make a great murder weapon.

Just because you can use a tool for crime, doesn’t mean that tool was designed/intended for crime.

Not exactly. This would be more akin to a company that will 3D printer metal parts and assemble them for you. You use this service and have them create and assemble a gun for you. Then you use that weapon in a violent crime. Should the company have known better that you were having them create an illegal weapon on your behalf?

The person who was charged was using Stable Diffusion to generate the images on their own computer, entirely with their own resources. So it's akin to a company that sells 3D printers selling a printer to someone, who then uses it to build a gun.

It would be more like outlawing ivory grand pianos because they require dead elephants to make - the AI models under question here were trained on abuse.

A person (the arrested software engineer from the article) acquired a tool (a copy of Stable Diffusion, available on github) and used it to commit crime (trained it to generate CSAM + used it to generate CSAM).

That has nothing to do with the developer of the AI, and everything to do with the person using it. (hence the arrest...)

I stand by my analogy.

Unfortunately the developer trained it on some CSAM which I think means they're not free of guilt - we really need to rebuild these models from the ground up to be free of that taint.

Reading that article:

Given it's public dataset not owned or maintained by the developers of Stable Diffusion; I wouldn't consider that their fault either.

I think it's reasonable to expect a dataset like that should have had screening measures to prevent that kind of data being imported in the first place. It shouldn't be on users (here meaning the devs of Stable Diffusion) of that data to ensure there's no illegal content within the billions of images in a public dataset.

That's a different story now that users have been informed of the content within this particular data, but I don't think it should have been assumed to be their responsibility from the beginning.

Sounds to me it would be more like outlawing grand pianos because of all of the dead elephants - while some people are claiming that it is possible to make a grand piano without killing elephants.

There's CSAM in the training set[1] used for these models so some elephants have been murdered to make this piano.

https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse

3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model's capabilities.

I know. So to confirm, you're saying that you're okay with AI generated CSAM as long as the training data for the model didn't include any CSAM?

No, I'm not - I still have ethical objections and I don't believe CSAM could be generated without some CSAM in the training set. I think it's generally problematic to sexually fantasize about underage persons though I know that's an extremely unpopular opinion here.

So why are you posting all over this thread about how CSAM was included in the training set if that is in your opinion ultimately irrelevant with regards to the topic of the post and discussion, the morality of using AI to generate CSAM?

Because all over this thread are claims that AI CSAM doesn't need actual CSAM to generate. We currently don't have AI CSAM that is taint free and it's unlikely we ever will due to how generative AI works.

So at best we don't know whether or not AI CSAM without CSAM training data is possible. "This AI used CSAM training data" is not an answer to that question. It is even less of an answer to the question "Should AI generated CSAM be illegal?" Just like "elephants get killed for their ivory" is not an answer to "should pianos be illegal?"

If your argument is that yes, all AI CSAM should be illegal whether or not the training used real CSAM, then argue that point. Whether or not any specific AI used CSAM to train is an irrelevant non sequitur. A lot of what you're doing now is replying to "pencils should not be illegal just because some people write bad stuff" with the equivalent of "this one guy did some bad stuff before writing it down". That is completely unrelated to the argument being made.

That's not the point. You don't train a hammer from millions of user inputs.

You gotta ask, if the AI can produce inappropriate material, then where did the developers get the training data, and what exactly did they train those AI models for?

Do... Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?

Or are you arguing that we should be allowed to do what's been done in the article? (arrest and charge the individual responsible for training their copy of an AI model to generate CSAM)

One, AI image generators can and will spit out content vastly different than anything in the training dataset (this ofc can be influenced greatly by user input). This can be fed back into the training data to push the model towards the desired outcome. Examples of the desired outcome are not required at all. (IE you don't have to feed it CSAM to get CSAM, you just have to consistently push it more and more towards that goal)

Two, anyone can host an AI model; it's not reserved for big corporations and their server farms. You can host your own copy and train it however you'd like on whatever material you've got. (that's literally how Stable Diffusion is used) This kind of explicit material is being created by individuals using AI software they've downloaded/purchased/stolen and then trained themselves. They aren't buying a CSAM generator ready to use off the open market... (nor are they getting this material from publicly operating AI models)

They are acquiring a tool and moulding it into a weapon of their own volition.

Some tools you can just use immediately, others have a setup process first. AI is just a tool, like a hammer. It can be used appropriately, or not. The developer isn't responsible for how you decide to use it.

Then that settles it. It's whoever allows bad data into the training data.

Do... Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?

Yes. Because they did (not intentionally though)

https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse

3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model's capabilities.

I think that’s a bit of a stretch. If it was being marketed as “make your fantasy, no matter how illegal it is,” then yeah. But just because I use a tool someone else made doesn’t mean they should be held liable.

Check my other comments. My thought was compared to a hammer.

Hammers aren't trained to act or respond on their own from millions of user inputs.

Image AIs also don't act or respond on their own. You have to prompt them.

And if I prompted AI for something inappropriate, and it gave me a relevant image, then that means the AI had inappropriate material in it's training data.

No, you keep repeating this but it remains untrue no matter how many times you say it. An image generator is able to create novel images that are not directly taken from its training data. That's the whole point of image AIs.

An image generator is able to create novel images that are not directly taken from its training data. That's the whole point of image AIs.

I just want to clarity that you've bought the silicon valley hype for AI but that is very much not the truth. It can create nothing novel - it can merely combine concepts and themes and styles in an incredibly complex manner... but it can never create anything novel.

What it's able and intended to do is besides the point, if it's also capable of generating inappropriate material.

Let me spell it more clearly. AI wouldn't know what a pussy looked like if it was never exposed to that sort of data set. It wouldn't know other inappropriate things if it wasn't exposed to that data set either.

Do you see where I'm going with this? AI only knows what people allow it to learn...

You realize that there are perfectly legal photographs of female genitals out there? I've heard it's actually a rather popular photography subject on the Internet.

Do you see where I'm going with this? AI only knows what people allow it to learn...

Yes, but the point here is that the AI doesn't need to learn from any actually illegal images. You can train it on perfectly legal images of adults in pornographic situations, and also perfectly legal images of children in non-pornographic situations, and then when you ask it to generate child porn it has all the concepts it needs to generate novel images of child porn for you. The fact that it's capable of that does not in any way imply that the trainers fed it child porn in the training set, or had any intention of it being used in that specific way.

As others have analogized in this thread, if you murder someone with a hammer that doesn't make the people who manufactured the hammer guilty of anything. Hammers are perfectly legal. It's how you used it that is illegal.

Yes, I get all that, duh. Did you read the original post title? CSAM?

I thought you could catch a clue when I said inappropriate.

Yes. You're saying that the AI trainers must have had CSAM in their training data in order to produce an AI that is able to generate CSAM. That's simply not the case.

You also implied earlier on that these AIs "act or respond on their own", which is also not true. They only generate images when prompted to by a user.

The fact that an AI is able to generate inappropriate material just means it's a versatile tool.

The AI had CSAM in its training model:

https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse

3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model's capabilities.

Alright, well let's play an innocent hypothetical here.

Let's pretend you only know some magic word model (doesn't exist without thousands or millions of images by the way).

But anyways, let's say you're the AI. Now, with no vision of the world, what would you, as an AI, say if I asked you about how crescent wrenches and channel locks reproduced?

Now try the same hypothetical question again. This time, you actually have a genuine set of images of clean new tools, plus information that tools can't reproduce.

And now let's go to the modern day. Where AI has zillions of images of rusty redneck toolboxes, and a bunch of janky dialogue..

After all that, then where do crowbars come from?

AI is just as dumb as the people using it.

I learned how to write by reading. The AI did the same, more or less, no?

The AI didn't learn to draw or generate photos from blind words though...

Oh, it learned from art? Like how human artists learn?

AI hasn't exactly kicked out a Picasso with a naked young girl missing an ear yet has it?

I sure hope not!

But if it can, then that seriously indicates it must have some bad training data in the system..

I won't be testing these hypotheses.

It in fact does have bad training data! https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse

Thank you for posting a relevant link. It's disappointing that such data is any part of any public AI systems.. ☹️

Can we do guns next?

I'd rather not fart bullets, but thank you for inviting me to the party.

I'm not sure why you're picking this situation for an anti-AI rant. Of course there are a lot of ways that large companies will try to use AI that will harm society. But this is a situation where we already have laws on the books to lock up the people who are specifically doing terrible things. Good.

If you want to try to stand up and tell us about how AI is going to damage society, pick an area where people are using it legally and show us the harms there. Find something that's legal but immoral and unethical, and then you'll get a lot of support.

Totally dismissing inappropriate usage, AI can be funny and entertaining, but on the flip side it's also taking people's jobs.

It shouldn't take a book, let alone 3 seconds of common sense thought, to realize that.

10 more...