Authors Are Furious After Finding Their Works on List of Books Used To Train AI to – 430 points –

Authors using a new tool to search a list of 183,000 books used to train AI are furious to find their works on the list.


You are viewing a single comment

Using it to (create a tool to) create derivatives of the work on a massive scale.

An AI model is not a derivative work. It does not contain the copyrighted expression, just information about the copyrighted expression.

Wikipedia: In copyright law, a derivative work is an expressive creation that includes major copyrightable elements of a first, previously created original work.

I think you may be off a bit on what a derivative work is. I don't see LLMs spouting out major copyrightable elements of books. They can give a summary sure, but Cliff Notes would like to have a word if you think that's copyright infringement.

Well when that happens we have laws. So no problems

Would you be okay with applying that argument for any crime?

I would be, and I don't understand why you think this would be a problem. I wouldn't want the government to be preventing activities that there weren't any actual laws prohibiting.

Ever heard of the early 21st century classic Minority Report

You're missing the point. I'll make your example more specific.

Well when fraud/rape/murder happens we have laws. So no problems.

Those things happen. Creating a LLM based on copyrighted material without permission happens - it's not a hypothetical. But even then, giving a punishment after the fact does not make the initial crime "no problem", as you put it.

2 more...