AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)

Technology@lemmy.ml – 53 points – 1 years ago

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)

Some argue that bots should be entitled to ingest any content they see, because people can.

You all really want a terminator-like future don't you. Let's use the most inflated possible wording, certainly there will be no issues

Get a grip. Skynet didn't evolve cuz it scanned harry potter and watership down.

This is the best summary I could come up with:

Unfortunately, many people believe that AI bots should be allowed to grab, ingest and repurpose any data that’s available on the public Internet whether they own it or not, because they are “just learning like a human would.” Once a person reads an article, they can use the ideas they just absorbed in their speech or even their drawings for free.

Iris van Rooj, a professor of computational cognitive science at Radboud University Nijmegen in The Netherlands, posits that it’s impossible to build a machine to reproduce human-style thinking by using even larger and more complex LLMs than we have today.

NY Times Tech Columnist Farhad Manjoo made this point in a recent op-ed, positing that writers should not be compensated when their work is used for machine learning because the bots are merely drawing “inspiration” from the words like a person does.

“When a machine is trained to understand language and culture by poring over a lot of stuff online, it is acting, philosophically at least, just like a human being who draws inspiration from existing works,” Manjoo wrote.

In his testimony before a U.S. Senate subcommittee hearing this past July, Emory Law Professor Matthew Sag used the metaphor of a student learning to explain why he believes training on copyrighted material is usually fair use.

In fact, Microsoft, which is a major investor in OpenAI and uses GPT-4 for its Bing Chat tools, released a paper in March claiming that GPT-4 has “sparks of Artificial General Intelligence” – the endpoint where the machine is able to learn any human task thanks to it having “emergent” abilities that weren’t in the original model.

The original article contains 4,088 words, the summary contains 274 words. Saved 93%. I'm a bot and I'm open source!

This article acts like it is a privilege to read a book or hear a song and it can be revoked...lol

Rights are irrelevant.

You made something. That doesn't give u the right to say what can or can't ingest it.

Under these rules all fanfic would be illegal.

Search engines would be illegal...can't scan my website that's copyrighted. Radio would be illegal. Random ppl listening to ur songs...that's a nono.

AI does not learn as we do when ingesting information.

I read an article about a subject. I will forget some of it. I will misunderstand some of it. I will not understand some of it. (These two are different because in misunderstanding I think I understand but I am wrong. In simply not understanding the information I can not make heads or tails of that portion)

Later when I make use of what I may have learned these same effects will happen again to whatever it was I correctly understood.

Another, I as a natural intelligence know what I can quote, and what I should not due to copyrights, social mores, and law. AI regurgitates everything that might match regardless of source.

The third issue: The AI does not understand even with copious training data. It does not know that dogs bark, it does not have a concept of a dog.

I once wrote a more simple program that took a body of text and noted the third letter following each set of two, it built probability tables from the pair of letters + the next letter. After ingesting what little training information I was able to give it it would choose two letters at random and then generate the following letter using the statistics it had learned. It had no concept of words, much less the meaning of any words it might form.

I read an article about a subject. I will forget some of it. I will misunderstand some of it. I will not understand some of it. (These two are different because in misunderstanding I think I understand but I am wrong. In simply not understanding the information I can not make heads or tails of that portion)

Just because you're worse at comprehension or have worse memory doesn't make you any more real. And AIs also "forget" things, they also get stuff imperfectly, because they don't store any actual "full length texts" or anything. It's just separete words (more or less) and the likelyhood of what should come next.

Another, I as a natural intelligence know what I can quote, and what I should not due to copyrights, social mores, and law. AI regurgitates everything that might match regardless of source.

Except you don't not perfectly. You can be absolutely sure that you often say something someone else has said or written, which means they technically have a copyright to it... But noone cares for the most part.

And it goes the other way too - you can quote something imperfectly.

Both actually can/do happen already with AIs, though it would be great if we could train them with proper attribution - at least for the clear cut cases.

The third issue: The AI does not understand even with copious training data. It does not know that dogs bark, it does not have a concept of a dog.

A sufficiently advanced artificial intelligence would be indistinguishible from natural intelligence. What sets them apart then?

You can look at animals, too. They also have intelligence, and yet there are many concepts that are incomprehensible to them.

The thing is though, how can you actually tell that you don't work the exact same way? Sure the AI is more primitive, has less inputs - text only, no other outside stimuli - but the basis isn't all that different.

When creating art do you get to make rules about who or what experiences it? Or is that a selfish asshole take?

Paint a picture but only some ppl get to see it. Sing a song but only some get to hear it.

What planet do you live on where those things are true?

Well, that's the question at hand. Who? Definitely not, people have an innate right to think about what they observe, whether that thing was made by someone else, or not.

What? I'd argue that's a much different question.

Let's take an extreme case. Entertainment industry producers tried to write language into the SAG-AFTRA contract that said that, if an extra is hired for a production, they can use that extra's image -- including 3D spatial body scans -- in perpetuity, for any purpose, and that privilege of eternal image storage and re-use was included in the price of hiring an extra for 1 day of work.

The producers would make precisely the same argument you are -- how dare you tell them how they can use the images that they captured, even if it's to use and re-use a person's image and shape in visual media, forever. The actors argue that their physiognomy is part of their brand and copyright, and using their image without their express permission (and, should they require it, compensation) is a violation of their rights.

Or, I could just take pictures of somebody in public places without their consent and feed them into an AI to create pictures of the subject flashing children. They were my pictures, taken by me, and how dare anybody get to make rules about who or what experiences them, right?

The fact is, we have rules about the capture and re-use of created works that have applied to society for a very long time. I don't think we should give copyright holders eternal locks on their work, but neither is it clear that a 100% free use policy on created work is the right answer. It is reasonable to propose something in between.

What is not a different question. As a creator you don't get to say what or who can ingest your creation. If you did Google image search wouldn't exist.

The thing you're failing to realize is that this isn't the first time a computer has been used to ingest info. The rules you assert have never been true to this point. Crawlers have been scanning web pages and images since the dawn of the Internet.

You act like this just started happening so now you get to put rules on what gets to look at that image. Too late there's decades of precedent.

But there are absolutely rules on whether Google -- or anything else -- can use that search index to create a product that competes with the original content creators.

For example, https://en.wikipedia.org/wiki/Authors\_Guild,\_Inc.\_v.\_Google,\_Inc.

Google indexing of copyrighted works was considered "fair use" only because they only offered a few preview pages associated with each work. Google's web page excerpts and image thumbnails are widely believed to pass fair use under the same concept.

Now, let's say Google wants to integrate the content of multiple copyrighted works into an AI, and then give away or sell access to that AI which can spit out the content (paraphrased, in some capacity) of any copyrighted work it's ever seen. You'll even be able to ask it questions, like "What did Jeff Guin say about David Koresh's religious beliefs in his 2023 book, Waco?" and in all likelihood it will cough up a summary of Mr. Guinn's uniquely discovered research and journalism.

I don't think the legal questions there are settled at all.

You just proved my point there is nothing from stopping Google from scanning all those they just have to limit what they show of what they scanned. There it is easy to prove because the content is verbatim.

In the case of ai it is not verbatim. How do you prove the results are directly derived from say reading harry potter vs ingesting a forums worth of content regarding hp? I don't think as a plaintiff u can show damages or that your works were even used... The only reason this is even an issue is because chatgpt creators admitted they scanned books etc.

How do you prove the results are directly derived

Mathematically? It's a computer algorithm. Its output is deterministic, and both reproducible and traceable.

Give the AI two copies of its training dataset, one with the copyrighted work, one without it. Now give it the same prompt and compare the outputs.

The difference is the contribution of the copyrighted work.

You mention Harry Potter. In Warner Bros. Entertainment, Inc. v. RDR Books, Warner Brothers lawyers argued that a reference encyclopedia for the Harry Potter literary universe was a derivative work. The court disagreed, on the argument that the human authors of the reference book had to perform significant creative work in extracting, summarizing, indexing and organizing the information from JK Rowling's original works.

I wonder if the court would use the same reasoning to defend the work of an AI?