Training Generative AI Models on Copyrighted Works Is Fair Use - Change My Mind

commie@lemmy.dbzer0.com to Technology@lemmy.world – 47 points –
mastodon.lawprofs.org

I fucked with the title a bit. What i linked to was actually a mastodon post linking to an actual thing. but in my defense, i found it because cory doctorow boosted it, so, in a way, i am providing the original source here.

please argue. please do not remove.

101

You are viewing a single comment

Every web request costs someone money. If you aren't paying them you are being provided a service. They've given you knowledge/ material in their possession free of charge. You are taking advantage of that good will by using the content for purposes not intended. That is a moral failing.

To be clear the ownership of the material is not important, just the access is immoral, as the harm is already done.

Ill add the caveat that it can be moral if they've specifically told you you can via the websites robot.txt file which websites of consequence all have. But the assumption has to be they don't intend this because that is how consent works.

They've given you knowledge/ material in their possession free of charge.

this is a very common human activity

You asked if it's moral, this is irrelevant

You asked if it’s moral

I did not

The original post in this chain talked about ethics, I was continuing that conversation.

In terms of free use, I feel the collection/aggregation of the data is a work in itself. You are taking a greater portion than the author specified you can take. Courts have ruled this does not constitute free use when people used yahoo's market data. How is it any different now when people are using orders of magnitudes more data.

You are taking advantage of that good will by using the content for purposes not intended. That is a moral failing.

only if there were so e sort of agreement about what the acceptable uses are and what is not acceptable.

That's exactly what robot.txt is... they spell out that they don't want you to access this site with an automated system.

right. so hiring 50 college kids to manually visit every page and cache it for study is fine.

That would probably be more expensive than just paying companies. But it is morally different because a human did visit their website so their good will was not violated as they expressed this consent when they published the website.

the assumption has to be they don't intend this

why? if someone publishes something on port 80, why should I ever assume they mean anything but for me to have and use that data?

Because there is a standard way for people to make their consent known. Just because you ignore someone withholding you consent doesn't mean you are free morally.

I'd say it is immoral not to share useful information with other people.

If you aren't paying them you are being provided a service.

if you ARE paying them, you're being provided a service, too

Yes I agree your use style could be immoral based on the agreement your transaction specifies. But if you've agreed your payment is to access their material then you have consent.