Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

Technology@beehaw.org – 371 points – 1 years ago

Google says AI systems should be able to mine publishers’ work unless companies opt out

In its submission to the Australian government’s review of the regulatory framework around AI, Google said that copyright law should be altered to allow for generative AI systems to scrape the internet.

You are viewing a single comment

View all comments

It’s not turning copyright law on its head, in fact asserting that copyright needs to be expanded to cover training a data set IS turning it on its head. This is not a reproduction of the original work, its learning about that work and and making a transformative use from it. An generative work using a trained dataset isn’t copying the original, its learning about the relationships that original has to the other pieces in the data set.

This is artificial pseudointelligence, not a person. It doesn't learn about or transform anything.

Im not the one anthropomorphising the technology here.

To take those statements seriously, you will need to:

define and describe in detail the processes by which "a person" learns
define and describe in detail how "a person" transforms anything
define and describe in detail what is "intelligence"
define and describe in detail what these "artificial paeudointelligences" are doing
define and describe in detail the differences between the latter and the previous points

Otherwise, I'll claim that "a person" is running exactly the same processes (neural networks, LLMs, hallucinations), and that calling these AIs "artificial paeudointelligences" is nothing else than dehumanizing a minority just because you feel threatened by them.

::: spoiler spoiler asdfasdfsadfasfasdf :::

The lines between learning and copying are being blurred with AI. Imagine if you could replay a movie any time you like in your head just from watching it once. Current copyright law wasn’t written with that in mind. It’s going to be interesting how this goes.

Imagine being able to recall the important parts of a movie, it's overall feel, and significant themes and attributes after only watching it one time.

That's significantly closer to what current AI models do. It's not copyright infringement that there are significant chunks of some movies that I can play back in my head precisely. First because memory being owned by someone else is a horrifying thought, and second because it's not a distributable copy.

the thought of human memory being owned is horrifying. We’re talking about AI. This is a paradigm shift. New laws are inevitable. Do we want AI to be able to replicate small creators work and ruin their chances at profitability? If we aren’t careful, we are looking at yet another extinction wave where only the richest who can afford the AI can make anything. I don’t think it’s hyperbole to be concerned.

The question to me is how you define what the AI is doing in a way that isn't hilariously overbroad to the point of saying "Disney can copyright the style of having big eyes and ears", or "computers can't analyze images".

Any law expanding copyright protections will be 90% used by large IP holders to prevent small creators from doing anything.

What exactly should be protected that isn't?

If I had the answer I’d be writing my congresswoman immediately. All I know is allowing AI unfettered access to just have all content is going to be a huge problem.

How many movies are based on each other? It's a lot, even if it's just loosely based on it. If you stopped allowing that then you would run out of new things to do.

Let me ask you this: do you think our brains and LLM’s are, overall, pretty distinct? This is not a trick or bait or something, I’m just going through this methodically in hopes my position - which is shared by some others in this thread it seems - is better understood.

I don't think they work the same way, but I think they work in ways that are close enough in function that they can be treated the same for the purposes of this conversation.

Pen and pencil are "the same", and either of those and printed paper are "basically the same".
The relationship between a typical modern AI system and the human mind is like that between a pencil written document and a word document: entirely dissimilar in essentially every way, except for the central issue of the discussion, namely as a means to convey the written word.

Both the human mind and a modern AI take in input data, and extract relationships and correlations from that data and store those patterns in a batched fashion with other data.
Some data is stored with a lot of weight, which is why I can quote a movie at you, and the AI can produce a watermark: they've been used as inputs a lot. Likewise, the AI can't perfectly recreate those watermarks and I can't tell you every detail from the scene: only the important bits are extracted. Less important details are too intermingled with data from other sources to be extracted with high fidelity.

my head [...] not a distributable copy.

There has been an interesting counter-proposal to that: make all copies "non-distributable" by replacing the 1:1 copying, by AI:AI learning, so the new AI would never have a 1:1 copy of the original.

It's in part embodied in the concept of "perishable software", where instead of having a 1:1 copy of an OS installed on your smartphone/PC, a neural network hardware would "learn how to be a smartphone/PC".

Reinstalling, would mean "killing" the previous software, and training the device again.

Right, because the cool part of upgrading your phone is trying to make it feel like its your phone, from scratch. Perishable software is anything but desirable, unless you enjoy having the very air you breathe sold to you.

Well, depends on desirable "by whom".

Imagine being a phone manufacturer and having all your users running a black box only you have the means to re-flash or upgrade, with software developers having to go through you so you can train users' phones to "behave like they have the software installed"

It's a dictatorial phone manufacturer's wet dream.

Yes, that's exactly my problem with it.

Imagine if you could replay a movie any time you like in your head just from watching it once.

Two points:

These AIs can't do that; they need thousands or millions of repetitions to "learn" the movie, and every time they "replay" the movie it is different from the original.
"learning by rote" is something fleshbags can do, and are actually required to by most education systems.

So either humans have been breaking the copyright all this time, or the machines aren't breaking it either.

You have one brain. You could have as many instances of AI as you can afford. In a general sense, it’s different, and acting like it’s not is going to hit you like a freight train if you don’t prepare for it.

That's a different goalpost. I get the difference between 8 billion brains, and 8 billion instances of the same AI. That has nothing to do with whether there is a difference in copyright infringement, though.

If you want another goalpost, that IMHO is more interesting: let's discuss the difference between 8 billion brains with up to 100 years life experience each, vs. just a million copies of an AI with the experience of all human knowledge each.

(That's still not really what's happening, which is tending more towards several billion copies of AIs with vast slices of human knowledge each).

It’s all theoretical at this stage, but like everything else that society waits until it’s too late for, I think it’s reasonable to be cautious and not just let AI go unregulated.

It's not reasonable to regulate stuff before it gets developed. Regulation means establishing some limits and controls on something, which can't be reasonably defined before that "something" even exists, much less tested or decided whether the regulation has whatever desired effects it intends.

For what is worth, a "theoretical regulation" already exists: it's the Asimov's Rules of Robotics. Turns out current AIs are not robots, and that regulation is nonsense when applied to stable diffusion or LLMs.

I disagree. Over the last twenty years or so we have plenty examples of things they should have been regulated from the start that weren’t, and now it’s very difficult to do so. Every “gig economy” business for example.

Well fleshbags have to pay several years worth of salary to get their education, so by your comparison, Google's AI should too.

Imagine thinking Public Education doesn't count. Or that no one without a college degree ever invented anything useful. That's before we get to your notion of "College SHOULD be expensive, for everyone, always".

The problem with education is NOT that some people pay less for theirs, or nothing at all, nor that some even have the audacity to learn quickly. AI could help everyone to have a chance to learn cheaply, even quickly.

You're just off on your own little rant now, arguing points I never even implied.

That's wrong on so many levels:

Go check the Gutenberg Project and the patent registry, come back when you've learned them all, they're 100% free for everyone.
Fleshbags have to pay for "dumbed down" educational material just to have a chance at learning anything during their lifespan, AIs don't.
The lion's share of "paying for education" isn't even paid for education, but for certification. AIs would have to pay the same... if any were dumb enough to spend "several years worth of salary" on some diploma.
The only part worth paying for, is "hands on experience", which right now is far more expensive for AIs (need simulations and robots built).
Training AIs already isn't free, they need thousands to millions of repetitions to learn the stuff, which means quite a buck in server costs.

So just because fleshbags are really bad at learning, does not mean Google's AI has to pay for the same shortcomings, they already pay for their own.

17 more...