Reddit Will License Its Data to Train LLMs, So We Made a Firefox Extension That Lets You Replace Your Comments

Blaze@lemmy.blahaj.zone to

Reddit@lemmy.world – 321 points – 6 months ago

theluddite.org

cross-posted from: https://lemmy.ca/post/19946388

An anticapitalist tech blog. Embrace the technology that liberates us. Smash that which does not.

You are viewing a single comment

View all comments Show the parent comment

I think you missed the part where you were strongly suggested "not" to use copyrighted text.

The point is not to get rid of the original text. It's to "poison" the training data.

If the AI trainers have the original text then "poisoning" the live site's content isn't going to do anything at all.

You can't touch the original text. It's already been archived.

If they scrape the updated comments again and ingest copyrighted text, you are poisoning the data.

That's my point. They won't.

And even if they did, it's unclear that copyright has anything to say about AI training anyway.

NYT is currently suing because of copyright infringiments.

https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html

it’s unclear that copyright has anything to say about AI training anyway

Although lawmakers worldwide have slept while AI advanced and therefore missed to make some important laws, they are catching up. Europe recently passed its first AI act. As far as I've seen it also states that companies must disclose a detailed summary of their training data.

https://www.ml6.eu/blogpost/ai-models-compliance-eu-ai-act

You can sue about anything you want in the United States, it remains to be seen whether the courts will side with them. I think it's unlikely they'll get much of a win out of it.

A law that requires disclosing a summary of training data isn't going to stop anyone from using that training data.

1 more...