Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

Technology@beehaw.org – 371 points – 1 years ago

Google says AI systems should be able to mine publishers’ work unless companies opt out

In its submission to the Australian government’s review of the regulatory framework around AI, Google said that copyright law should be altered to allow for generative AI systems to scrape the internet.

I agree with google, only I go a step further and say any AI model trained on public data should likewise be public for all and have its data sources public as well. Can't have it both ways Google.

To be fair, Google releases a lot of models as open source: https://huggingface.co/google

Using public content to create public models is also fine in my book.

But since it's Google I'm also sure they are doing a lot of shady stuff behind closed doors.

I hope that too, but I'm less optimistic. We live in a capitalistic world.

Copyright law already allows generative AI systems to scrape the internet. You need to change the law to forbid something, it isn't forbidden by default. Currently, if something is published publicly then it can be read and learned from by anyone (or anything) that can see it. Copyright law only prevents making copies of it, which a large language model does not do when trained on it.

A lot of licensing prevents or constrains creating derivative works and monetizing them. The question is for example if you train an AI on GPL code, does the output of the model constitute a derivative work?

If yes, Github Copilot is illegal as it produces code that should comply to multiple conflicting license requirements. If no, I can write some simple AI that is "trained" to regurgitate its output on a prompt, and run a leaked copy of Windows through it, then go around selling Binbows and MSFT can't do anything about it.

The truth is mostly between the two, this is just piracy, which always has been a gray area because of the difficulty of prosecuting it, previously because the perpetrators were many and hard to find, now it's because the perpetrators are billion dollar companies with expensive lawyer teams.

The question is for example if you train an AI on GPL code, does the output of the model constitute a derivative work?

This question is completely independent of whether the code was generated by an AI or a human. You compare code A with code B, and if the judge and jury agree that code A is a derivative work of code B then you win the case. If the two bodies of work don't have sufficient similarities then they aren't derivative.

If no, I can write some simple AI that is “trained” to regurgitate its output on a prompt

You've reinvented copy-and-paste, not an "AI." AIs are deliberately designed to not copy-and-paste. What would be the point of one that did? Nobody wants that.

Filtering the code through something you call an AI isn't going to have any impact on whether you get sued. If the resulting code looks like copyrighted code, then you're in trouble. If it doesn't look like copyrighted code then you're fine.

AIs are deliberately designed to not copy-and-paste.

AI is a marketing term, not a technical one. You can call anything "AI", but it's usually predictive models that get called that.

AIs are deliberately designed to not copy-and-paste. What would be the point of one that did? Nobody wants that.

For example if the powers that be decided to say licenses don't apply once you feed material through an "AI", and failed to define AI, you could say you wrote this awesome OS using an AI that you trained exclusively using Microsoft proprietary code. Their licenses and copyright and stuff doesn't apply to AI training data so you could sell that new code your AI just created.

It doesn't even have to be 100% identical to Windows source code. What if it's just 80%? 50%? 20%? 5%? Where is the bar where the author can claim "that's my code!"?

Just to compare, the guys who set out to reimplement Win32 APIs for use in Linux (the thing that made it into MacOS as well now) deliberately would not accept help from anyone who ever saw any Microsoft source code for fear of being sued. The bar was that high when it was a small FOSS organization doing it. It was 0%, proven beyond a doubt.

Now that Microsoft is the author, it's not a problem when Github Copilot spits out GPL code word for word, ironically together with its license.

AI is a marketing term, not a technical one.

The reverse, actually. Artificial intelligence is a field of research that includes things like machine learning, as well as lots of even more mundane applications. It's pop culture that has hijacked it to mean "a thing exactly as capable as a human brain, but in computer form."

For example if the powers that be decided to say licenses don’t apply once you feed material through an “AI”, and failed to define AI, you could say you wrote this awesome OS using an AI that you trained exclusively using Microsoft proprietary code.

Once again, it doesn't matter what you "feed code through." Copyright applies to the tangible result. If the output from the AI matches closely to something that's already copyrighted then that copyright applies to it. If it doesn't match closely then that copyright doesn't apply to it. The actual process by which the code was produced doesn't matter one whit. If I took a Harry Potter book, put its pages through a shredder, randomly glued the particles of paper back together and it just so happened to closely replicate Lord of the Rings then the Tolkien estate has a case against me but the Rowling estate does not.

If the resulting code looks like copyrighted code, then you’re in trouble. If it doesn’t look like copyrighted code then you’re fine.

^^ Very much this.

Loads of people are treating the process of AI creating works as either violating copyright or not. But that is not how copyright works. It applies to the output of a process not the process itself. If someone ends up writing something that happens to be a copy of something they read before - that is a violation of copy write laws. If someone uses various works and creates something new and unique then that is not a violation. It does not - at this point in time at least - matter if that someone is a real person or an AI.

AI can both violate copy write on one work and not on another. Each case is independent and would need to be legislated differently. But AI can produce so much content so quickly that it creates a real problem for a case by case analysis of copy write infringement. So it is quite likely the laws will need to change to account for this and will likely need to treat AI works differently from human created works. Which is a very hard thing to actually deal with.

Now, one could also argue the model itself is a violation of copyright. But that IMO is a stretch - a model is nothing like the original work and the copyright law also does not cover this case. It would need to be taken to court to really decide on if this is allowed or not.

Personally I don't think the conversation should be on what the laws currently allow - they were not designed for this. But instead what the laws should allow. So we can steer the conversation towards a better future. Lots of artists are expressing their distaste for AI models to be trained on their works - if enough people do this laws can be crafted to backup this view.

then go around selling Binbows and MSFT can't do anything about it

I think this already happen. A very practical example, windows GUI has been copied by many Linus distros. And with windows 11 there's clearly a reference to Apple MacOS GUI with a sparkling of Google material design.

Should apple and Google be able to sue Microsoft because it "copied" their work? Should Google be able to sue apple because they "copied" the notification drop-down in iOS?

As you say it's really a grey area because the only reason we consider AI code to be "regurgitated" while human code to be "inspired" is only because we give humans more recognition of their intellectual abilities.

6 more...

You should read this.

6 more...

An AI model is a derivative work of its training data and thus a copyright violation if the training data is copyrighted.

A human is a derivative work of its training data, thus a copyright violation if the training data is copyrighted.

The difference between a human and ai is getting much smaller all the time. The training process is essentially the same at this point, show them a bunch of examples and then have them practice and provide feedback.

If that human is trained to draw on Disney art, then goes on to create similar style art for sale that isn't a copyright infringement. Nor should it be.

This is stupid and I'll tell you why.
As humans, we have a perception filter. This filter is unique to every individual because it's fed by our experiences and emotions. Artists make great use of this by producing art which leverages their view of the world, it's why Van Gogh or Picasso is interesting because they had a unique view of the world that is shown through their work.
These bots do not have perception filters. They're designed to break down whatever they're trained on into numbers and decipher how the style is constructed so it can replicate it. It has no intention or purpose behind any of its decisions beyond straight replication.
You would be correct if a human's only goal was to replicate Van Gogh's style but that's not every artist. With these art bots, that's the only goal that they will ever have.

I have to repeat this every time there's a discussion on LLM or art bots:
The imitation of intelligence does not equate to actual intelligence.

Absolutely agreed! I think if the proponents of AI artwork actually had any knowledge of art history, they'd understand that humans don't just iterate the same ideas over and over again. Van Gogh, Picasso, and many others, did work that was genuinely unique and not just a derivative of what had come before, because they brought more to the process than just looking at other artworks.

Yup. There seems to be a strong motive in many to not understand this concept as it makes their practices clearly ethically questionable.

My feeling is that the vast majority of pro-AI techbros come from a computer science, finance, or business background; undoubtedly intelligent people, but completely and utterly lacking in any appreciation or understanding of what actually goes into creative work. I'm sure they genuinely believe that there's no difference between what a human does and what an AI does, because they think art (or writing, music, etc) are just the product of an algorithm.

Ironically, my background is in mathematics but I also happen to be a writer so I see both sides of the argument. I just see the utter lack of compassion people have for those who produce creative work and the same people believe that if it can be automated, it should be automated.

Likely. Which is weird because algorithms are only a subset of software engineering, which requires abstract and creative thought to perform well.

I really, really, really wish people would understand this.

AI can only create a synthesis of exactly what it's fed. It has no life experience, no emotional experience, no nurture-related experiences, no cultural experiences that color it's thinking, because it isn't thinking.

The "AI are only doing what humans do" is such a brain-dead line of thinking, to the point that it almost feels like it's 100% in bad faith whenever it's brought up.

You're completely wrong, and I'll tell you why.

None of what you said matters, perception filters, intent, intelligence... it's all irrelevant to the discussion.

Copyright infringement only gives certain rights, and at least here in Canada using them to generate a model isn't one of those. Rights are for things like distribution, reproduction, public performance, communication, and exhibition. US law says you can't "Prepare derivative works based upon the work." but the model isn't a derivative work because it's not really a work at all, you can't even visually look at the model. You can't copyright an algorithm in the US or Canada.

Only the created art should be scrutinized for copyright infringement, and these systems can generate both (just like a human can).

Any enforcement should then be handled when that protected work is then used to infringe on the actual rights of the copyright holder.

I wasn't talking about copyright law in regards to the model itself.

I was talking about what is/isn't grounds for plagiarism. I strongly disagree with the idea that artists and art bots go through the same process. They don't and it's reductive to claim otherwise. It negatively impacts the perception of artists' work to assert that these models can automate a creative process which might not even involve looking at other artists' work because humans are able to create on their own.

A person who has never looked upon a single painting in their life can still produce a piece but the same cannot be said for an art bot. A model must be trained on work that you want the model to be able to imitate.

This is why ChatGPT required the internet to do what it does (the privacy violation is another big concern there). The model needed vast quantities of information to be sufficiently trained because language is difficult to decipher. Languages evolved by getting in contact with other languages and organically making new words. ChatGPT will never invent a new word because it's not intelligent, it is merely imitating intelligence.

"A person who has never looked upon a single painting in their life can still produce a piece but the same cannot be said for an art bot. A model must be trained on work that you want the model to be able to imitate."

No, they really can't. Go look a 1 year old's first attempt at "art" because it's nothing more than random smashing of colour on paper. A computer could easily generate such "work" as well with no training data at all. They've seen art at that point, and still can't replicate it because they need much more training first.

Humans require books (or teachers who read books) to learn how to read and write. That is "vast quantities of information" being consumed to learn how to do it. If you had never seen or heard of a book, you wouldn't be able to write a novel. It's also completely ignoring the fact that you had to previously learn the spoken language as well (which is a vast quantity of information that takes a human decades to acquire proficiency in even with daily practice)

Once again, being reductive about artists' work. Jackson Pollock's entire career was smashing colours on a canvas. If you want to argue that Pollock had to look at thousands of paintings before making his, I honestly can't take you seriously at that point.

A computer could easily generate such “work” as well with no training data at all.

Yes and in the eyes of its creators, that was deemed a failure which is why Midjourney and Dall-E are the way they are. These bots don't want to create art, they want to imitate it.

Children have barely any experiences and can still create something. You might not deem it worthy of calling it art but they created something despite their limited knowledge and life experience.

Of course, you'd need books to read and write. The words have to be written and you need to see the words in written form if you also want to write them. But one thing you don't take into account is handwriting. Another thing that is unique to every individual. Some have worse handwriting than others and with practice (like any muscle) it can be improved but you haven't had to have seen handwritten text before writing it yourself. You only need to be taught how to hold a pen and you can write.

Novels are complex structures of language just like poetry. In order to write novels, you have to consume novels because it's well understood that to find your own narrative voice you must see how others express theirs. Stories are told in unique ways and it's crucial as a writer to understand and break these concepts down. Intention and purpose form a core part of storytelling and an LLM cannot and will not be able to express those things.

They're written in certain ways because the author intended them to be that way, such as Cormac McCarthy deciding to be very minimalist with his punctuation.
I would love to see you make a point that an LLM without being specifically prompted to do so would make that stylistic decision. An LLM can't make that decision because unless you specify a style it is aware of, it won't organically do it.

I am also a writer. I've written a short story. One of my stylistic choices is that I don't use dialogue tags like "said". An LLM won't make that choice because it isn't designed to do so, it won't decide to minimise its use of dialogue tags to improve the flow of the narrative unless you told it to.

It’s also completely ignoring the fact that you had to previously learn the spoken language as well (which is a vast quantity of information that takes a human decades to acquire proficiency in even with daily practice).

Yes, in order to learn a spoken language you have to have heard it. However, languages evolve over time. You develop regional accents and dialects. All of the UK speaks English but no two towns speak the same way.

Jackson Pollock didn't create paintings, Jackson Pollock's art was story telling and showmanship.

Yes, in order to learn a spoken language you have to have heard it. However, languages evolve over time. You develop regional accents and dialects. All of the UK speaks English but no two towns speak the same way.

Just like different models have their own patterns of writing...

You're thinking about LLMs like they're equivalent to multiple people(or groups of people) but each LLM is equivalent to a single person. The training and resulting function of each one is as distinct as an individual human.

I could raise one of my children to perform the exact same functions as an LLM or art creation tool. Give them exactly the same image/text sets that these models are trained on, and have them practice for a decade or two. Then I could tell them "Hey I need a picture of an orange rabbit riding a bike" and they could draw me one, or write a story about the same topic. There's clearly no copyright infringement in that process, so why would it be different for creating a machine to do the same thing?

An LLM or art creation tool is barely equatable to one person. The difference between a child and an art creation tool is that you could show a child a single picture of a bunny, a bike and a carrot then ask them to draw an orange bunny riding a bike and they could draw something resembling that. An art bot would require hundreds to thousands of images of each object to understand what it is before it can even make a reasonable attempt. It's not even comparable the level of training required.

At least the child's drawing will have some personality in it, every output from an art bot ends up looking soulless. The reason for that is the simple concept that an art bot only imitates what it's been trained on and an artist draws on inspiration before applying the two things an art bot will never have; intent or purpose.

You're missing the training even a child has received to reach the state where they could do that. If you raised a child to 5 years completely by themselves in an empty room they wouldn't be able to draw anything at all, let alone something based on pictures. The act of drawing a variation on a bunny from a picture requires they learn and practice fine motor skills, and it requires them to have an understanding of animals.

Humans get literally 150,000+ hours of training time before we even let them try to become an adult.

Sure but the training isn't an algorithm deciding probabilities. Children do not 100% express themselves based on environment. On one side you have nature and the other you have nurture.

An example:
The FBI's studies into serial killers uncovered that these people, even though have been influenced by their environment to become what they are, respond to external stimuli in an abnormal way which is what leads them down that path to begin with.

A child learns how language and creativity is expressed before attempting to express themselves. These bots aren't built to deal with this expression because at their core, they are statistical models. It looks at a sentence like a series of variables to determine what comes next. The sentence itself could be nonsensical but the bot doesn't know that, it's using the probabilities it's been trained on to construct the sentence.

You might say bots have their own way of expressing themselves but I would say that's something we're applying to the bot than it is demonstrating itself. I'm sure it's very cute when it apologises for making a mistake but that apology isn't sincere, it's been programmed to respond that way when it thinks you're pointing out its mistakes. It's merely imitating a sense of remorse than displaying actual remorse.

this is stupid I’ll tell you why

Not sure why you think anyone would read anything if that’s how you start it.

a human does not copy previous work exactly like these algorithms, whats this shit take?

A human can absolutely copy previous works, and they do it all the time. Disney themselves license books teaching you how to do just that. https://www.barnesandnoble.com/w/learn-to-draw-disney-celebrated-characters-collection-disney-storybook-artists/1124097227

Not to mention the amount of porn online based on characters from copyrighted works. Porn that is often done as a paid commission, expressly violating copyright laws.

Neither does AI?

But considering that humans do get copyright strikes when they do something too similar that should also applies to AI, doesn't matter if it's not exact.

That should tell you something about how companies act. They're fine with these LLMs plagiarising content but when someone gets marginally close to their own trademarks, they get slammed.

Humans and AI are not the same and an equivalence should never be drawn.

Your feelings don't really matter, the fact of the matter is that the goal of ai is literally to replicate the function of a human brain. The way we're building them is often mimicking the same processes.

And LLMs and related technologies, by themselves, are artificial but not intelligent. So, the facts are not in favor of your argument to allow commercial parasitism on creative works.

I think you're missing a point here. If someone uses these to models to produce and distribute copyright infringing works, the original rights holder could go after the infringer.

The model itself isn't infringing though, and the process of creating the model isn't either.

It's a similar kind of argument to the laws that protect gun manufacturers from culpability from someone using their weapon to commit a crime. The user is the one doing the bad thing, they just produce a tool.

Otherwise, could Disney go after a pencil company because someone used one of their pencils to infringe on their copyright. Even if that pencil company had designed the pencil to be extremely good at producing Disney imagery by looking at a whole bunch of Disney images and movies to make sure it matches the size, colour, etc? No, because a pencil isn't a copyright infringement of art, regardless of the process used to design it.

Nah. You're missing the forest for the trees. Let's get abstract:

Person A makes a living by making product X and selling it.

Person B makes a living by making product Y and selling it.

Both A and B are in the same industry.

Person C uses a machine to extract the essence of product X and Y and blend them. Person C then claims authorship and sells it as product Z, which they sell in competition to X and Y.

Person C has not created anything. Their machine does not have value in the absence of products X and Y, yet received no permission, offers no credit nor compensation. In addition, they are competing for the same customers and harming the livelihoods of A and B. Person C is acting in a purely parasitic manner that cannot be seen as ethical in any widely accepted definition of the word.

You're missing something even more basic.

The machine Person C has created is not infringing on anything by itself. It's creation was not an infringement. "Extracting essence" isn't a protected right provided by the copyright frameworks. Only the actual art it is used to create could infringe (which most of the generated images do not).

If the final art created is an infringement, the existing copyright system handles that situation just like an infringing piece of art created by a human. The person at fault is the person who used the machine to create an infringing work, not the creator of the machine.

In your scenario, if a human C came along and looked at the art from Person A and B, blended them together into their own style, there wouldn't be any problem either. Even though they received no permission, and offered no credit nor compensation to the original creators. They would only get in trouble if they created an actual piece of art that was too similar to either of the specific artists works and therefore found to be infringing upon the copyright.

First, feeding something into a machine is not the same as looking at it. Person C literally creates nothing. They are a parasite. There's far more to creating than using statistical modeling algorithms. One cannot claim that that's what people studying a style and then creating someone are doing because it is empirically false.

Second, the scope of the discussion is not just "can someone legally get in trouble".

"Feeding something into a machine is not the same as looking at it" Most scientists would vehemently disagree. Human brains are just a complex and squishy computer. The fact that they're biological makes no difference to how we function. Input goes in, processing occurs, output comes out. Even the term "Computer" started as a job title for a human prior to the invention of mechanical and electric devices.

The scope of the discussion is absolutely what would get you in trouble. That's literally the entire post we're commenting on. We're not arguing if this SHOULD be allowed or not, we're arguing about whether current laws prohibit it.

You keep harping on about parasites, is every person who creates a machine to do a task that competes with humans parasitical in your fucked up world logic? If we want to make a machine to build widgets, an engineer will study how widgets get built, design a machine to do it instead, produce the machine, then a company will use it to outcompete the original manual widget makers. Same process for essentially every machine we've ever invented.

"Feeding something into a machine is not the same as looking at it" Most scientists would vehemently disagree. Human brains are just a complex and squishy computer.

In that aspect, we are absolutely in agreement. We are meat computers in meat cages containing necessary support systems. That statement was, perhaps, an oversimplification.

Things like LLMs are attempts to model how the human brain works but are not identical, nor are LLMs, by themselves, capable of intelligence. If one argues contrarily that feeding data into an LLM and using it to produce something is the same, then the one using the LLM is clearly not the author and claiming so is plagiarism of the work of either the creator of the LLM or the LLM itself.

The argument that, legally, IP owners cannot specify that their works may not be used as feedstock for competing commercial products is rather absurd itself and would invalidate all but the most permissive open-source licenses as well as proprietary licenses. As pointed out elsewhere, this line of thought would allow one to steal leaked source code and use it to effectively clone existing software. Use of the source in this manner would be infringing on the owner's IP rights.

Perhaps a good way to think about LLMs is as automated reverse engineering. They take data and statistically model it in order to characterize it. There is substantial case law there and the EFF has a great FAQ on the topic: https://www.eff.org/issues/coders/reverse-engineering-faq

The scope here is not limited to "can someone legally get in trouble under current law" (which, seems likely but is still working its way through courts). The discussion is specifically discussing ethics. Person C has created nothing. They should have no product to sell, if not for persons A and B. Their competition with those that their product is derived from is a parasitic relationship, plain and simple. They are performing an act of exploitation with measurable harm both to persons A and B but also to further development of their craft by destroying any incentive to continue it.

Now, in some sort of alternate economic system, where one's livelihood is not tied to their vocation, sure, it's possibly not problematic because the economic harm is removed. However, in current capitalist systems that are in place where LLMs are heavily hyped, it's an ethically bankrupt action to take.

ETA: No amount of mental gymnastics can change the fact that use of others' works without their consent to train a model, then claiming authorship and competing IS plainly theft of the labor that went into creating the original works.

That's not too say that LLMs and they like don't have value or often require effort to produce something worthwhile. Just that they need to be used in an ethical manner that improves the human condition, not as another tool to rob others of the fruit of their labors.

I'll remind you the original article title literally contains the words "copyright law"

This discussion is entirely about legality, not ethics.

By your stupid logic, I have created nothing in my job designing automation systems, since I just look at what people currently do, program a computer to do those tasks instead, and I profit off those people no longer needing to do that job.

You want to keep everyone fully employed in needless tasks? Go join the Mennonites.

I feel that you're being deliberately obtuse here in order to avoid the ethics dilemma.

A design is a "thing", software is a "thing" even if it is physically intangible. Designing automation systems requires more than just looking at existing processes or algorthmic modeling. It requires synthetic and abstract thought. Nor is it a parasitic process; the automation has value by itself nor is it dependent upon the outputs of those whose tasks it automates. Automation, in theory, also improves the human condition by reducing amount of labor required by a given individual (though this particular good has largely been stolen since the 80s).

1 more...

The goal of AI is fictional, and there's no solid evidence today that it will ever stop being fiction.

What at have today are stupid learning algorithms that are surprisingly good at mimicing intelligent people.

The most apt comparison today is a particularly clever parrot.

I'm all for having the discussion about how to handle AI when we have it, but it's bad faith to apply it to what we have today.

Critically, what we have today will never ever go on strike, or really make any kind of correct moral decision on it's own. We must treat it like dumb automation, because it is dumb automation.

the fact of the matter is that the goal of AI is literally to replicate the function of a human brain

…says who? That’s absolutely your feeling and not facts.

1 more...

Derivative works are only copyright violations when they replicate substantial portions of the original without changes.

The entirety of human civilization is derivative works. Derivative works aren't infringement.

That's just not true

It absolutely is. There's nothing out there in the past thousand years that isn't based on other prior art, copyright law only replies to direct copies, and there are explicit cutouts past that that allow you to directly copy some things if your work is transformative.

It is not a derivative work, the model does not contain any recognizable part of the original material that it was trained on.

Except when it produces exact copies of existing works, or when it includes a recognisable signature or watermark?

Ah, this old paper again. When it first came out it got raked over the coals pretty thoroughly. The authors used an older, poorly-trained version of Stable Diffusion that had been trained on only 160 million images and identified 350,000 images from the training set that had many duplicates and therefore could potentially be overfitted. They then generated 175 million images using tags commonly associated with those duplicate images.

After all that, they found 109 images in the output that looked like fuzzy versions of the input images. This is hardly a triumph of plagiarism.

As for the watermark, look closely at it. The AI clearly just replicated the idea of a Getty-like watermark, it's barely legible. What else would you expect when you train an AI on millions of images that contain a common feature, though? It's like any other common object - it thinks photographs often just naturally have a grey rectangle with those white squiggles in it, and so it tries putting them in there when it generates photographs.

These are extreme stretches and they get dredged up every time by AI opponents. Training techniques have been refined over time to reduce overfitting (since what's the point in spending enormous amounts of GPU power to produce a badly-artefacted copy of an image you already have?) so it's little wonder there aren't any newer, better papers showing problems like these.

Nevertheless, the Getty watermark is a recognisable element from the images the model was trained on, therefore you cannot state that the models don't spit out images with recognisable elements from the training data.

Take a close look at the "watermark" on the AI-generated image. It's so badly mangled that you wouldn't have a clue what it says if you didn't already know what it was "supposed" to say. If that's really something you'd consider "copyrightable" then the whole world's in violation.

The only reason this is coming up in a copyright lawsuit is because Getty is using it as evidence that Stability AI used Getty images in the training set, not that they're alleging the AI is producing copyrighted images.

I said "recognisable", and it is clearly recognisable as Getty's watermark, by virtue of the fact that many people, not only I, recognise it as such. You said that the models don't use any "recognizable part of the original material that it was trained on", and that is clearly false because people do recognise parts of the original material. You can't argue away other people's ability to recognise the parts of the original works that they recognise.

I said that models don't contain any recognizable part of the original material. They might be able to produce recognizable versions of parts of the original material, as we're seeing here. That's an important distinction. The model itself does not "contain" the images from the training set. It only contains concepts about those images, and concepts are not something that can be copyrighted.

If you want to claim copyright violations over specific output images, sure, that's valid. If I were to hit on exactly the right set of prompts and pseudorandom seed values to get a model to spit out an image that was a dead ringer for a copyrighted work and I was to distribute copies of that resulting image, that's copyright violation. But the model itself is not a copyright violation. No more than an artist is inherently violating copyright because he could potentially pick up his paint brush and produce a copy of an existing work that he's previously seen.

In any event, as I said, Getty isn't suing over the copyright to their watermark.

5 more...

6 more...

12 more...

To be honest I'm fine with it in isolation, copyright is bullshit and the internet is a quasi-socialist utopia where information (an infinitely-copyable resource which thus has infinite supply and 0 value under capitalist economics) is free and humanity can collaborate as a species. The problem becomes that companies like Google are parasites that take and don't give back, or even make life actively worse for everyone else. The demand for compensation isn't so much because people deserve compensation for IP per se, it's an implicit understanding of the inherent unfairness of Google claiming ownership of other people's information while hoarding it and the wealth it generates with no compensation for the people who actually made that wealth. "If you're going to steal from us, at least pay us a fraction of the wealth like a normal capitalist".

If they made the models open source then it'd at least be debatable, though still suss since there's a huge push for companies to replace all cognitive labor with AI whether or not it's even ready for that (which itself is only a problem insofar as people need to work to live, professionally created media is art insofar as humans make it for a purpose but corporations only care about it as media/content so AI fits the bill perfectly). Corporations are artificial metaintelligences with misaligned terminal goals so this is a match made in superhell. There's a nonzero chance corporations might actually replace all human employees and even shareholders and just become their own version of skynet.

Really what I'm saying is we should eat the rich, burn down the googleplex, and take back the means of production.

Or, if it was some non-profit doing the work for the good of everyone :')

If only there were some kind of open AI research lab lmao. In all seriousness Anthropic is pretty close to that, though it appears to be a public benefit corporation rather than a nonprofit. Luckily the open source community in general is really picking up the slack even without a centralized organization, I wouldn't be surprised if we get something like the Linux Foundation eventually.

Okay so I took back the means of production but it says it's a subscription basis now

That's late-stage capitalism for you – even revolution comes with a subscription fee

Probably shoulda read the Revolution TOS before clicking "I Agree".

2 more...

It’s not turning copyright law on its head, in fact asserting that copyright needs to be expanded to cover training a data set IS turning it on its head. This is not a reproduction of the original work, its learning about that work and and making a transformative use from it. An generative work using a trained dataset isn’t copying the original, its learning about the relationships that original has to the other pieces in the data set.

This is artificial pseudointelligence, not a person. It doesn't learn about or transform anything.

Im not the one anthropomorphising the technology here.

To take those statements seriously, you will need to:

define and describe in detail the processes by which "a person" learns
define and describe in detail how "a person" transforms anything
define and describe in detail what is "intelligence"
define and describe in detail what these "artificial paeudointelligences" are doing
define and describe in detail the differences between the latter and the previous points

Otherwise, I'll claim that "a person" is running exactly the same processes (neural networks, LLMs, hallucinations), and that calling these AIs "artificial paeudointelligences" is nothing else than dehumanizing a minority just because you feel threatened by them.

::: spoiler spoiler asdfasdfsadfasfasdf :::

The lines between learning and copying are being blurred with AI. Imagine if you could replay a movie any time you like in your head just from watching it once. Current copyright law wasn’t written with that in mind. It’s going to be interesting how this goes.

Imagine being able to recall the important parts of a movie, it's overall feel, and significant themes and attributes after only watching it one time.

That's significantly closer to what current AI models do. It's not copyright infringement that there are significant chunks of some movies that I can play back in my head precisely. First because memory being owned by someone else is a horrifying thought, and second because it's not a distributable copy.

the thought of human memory being owned is horrifying. We’re talking about AI. This is a paradigm shift. New laws are inevitable. Do we want AI to be able to replicate small creators work and ruin their chances at profitability? If we aren’t careful, we are looking at yet another extinction wave where only the richest who can afford the AI can make anything. I don’t think it’s hyperbole to be concerned.

The question to me is how you define what the AI is doing in a way that isn't hilariously overbroad to the point of saying "Disney can copyright the style of having big eyes and ears", or "computers can't analyze images".

Any law expanding copyright protections will be 90% used by large IP holders to prevent small creators from doing anything.

What exactly should be protected that isn't?

If I had the answer I’d be writing my congresswoman immediately. All I know is allowing AI unfettered access to just have all content is going to be a huge problem.

How many movies are based on each other? It's a lot, even if it's just loosely based on it. If you stopped allowing that then you would run out of new things to do.

my head [...] not a distributable copy.

There has been an interesting counter-proposal to that: make all copies "non-distributable" by replacing the 1:1 copying, by AI:AI learning, so the new AI would never have a 1:1 copy of the original.

It's in part embodied in the concept of "perishable software", where instead of having a 1:1 copy of an OS installed on your smartphone/PC, a neural network hardware would "learn how to be a smartphone/PC".

Reinstalling, would mean "killing" the previous software, and training the device again.

Right, because the cool part of upgrading your phone is trying to make it feel like its your phone, from scratch. Perishable software is anything but desirable, unless you enjoy having the very air you breathe sold to you.

Well, depends on desirable "by whom".

Imagine being a phone manufacturer and having all your users running a black box only you have the means to re-flash or upgrade, with software developers having to go through you so you can train users' phones to "behave like they have the software installed"

It's a dictatorial phone manufacturer's wet dream.

Yes, that's exactly my problem with it.

Let me ask you this: do you think our brains and LLM’s are, overall, pretty distinct? This is not a trick or bait or something, I’m just going through this methodically in hopes my position - which is shared by some others in this thread it seems - is better understood.

I don't think they work the same way, but I think they work in ways that are close enough in function that they can be treated the same for the purposes of this conversation.

Pen and pencil are "the same", and either of those and printed paper are "basically the same".
The relationship between a typical modern AI system and the human mind is like that between a pencil written document and a word document: entirely dissimilar in essentially every way, except for the central issue of the discussion, namely as a means to convey the written word.

Both the human mind and a modern AI take in input data, and extract relationships and correlations from that data and store those patterns in a batched fashion with other data.
Some data is stored with a lot of weight, which is why I can quote a movie at you, and the AI can produce a watermark: they've been used as inputs a lot. Likewise, the AI can't perfectly recreate those watermarks and I can't tell you every detail from the scene: only the important bits are extracted. Less important details are too intermingled with data from other sources to be extracted with high fidelity.

Imagine if you could replay a movie any time you like in your head just from watching it once.

Two points:

These AIs can't do that; they need thousands or millions of repetitions to "learn" the movie, and every time they "replay" the movie it is different from the original.
"learning by rote" is something fleshbags can do, and are actually required to by most education systems.

So either humans have been breaking the copyright all this time, or the machines aren't breaking it either.

You have one brain. You could have as many instances of AI as you can afford. In a general sense, it’s different, and acting like it’s not is going to hit you like a freight train if you don’t prepare for it.

That's a different goalpost. I get the difference between 8 billion brains, and 8 billion instances of the same AI. That has nothing to do with whether there is a difference in copyright infringement, though.

If you want another goalpost, that IMHO is more interesting: let's discuss the difference between 8 billion brains with up to 100 years life experience each, vs. just a million copies of an AI with the experience of all human knowledge each.

(That's still not really what's happening, which is tending more towards several billion copies of AIs with vast slices of human knowledge each).

It’s all theoretical at this stage, but like everything else that society waits until it’s too late for, I think it’s reasonable to be cautious and not just let AI go unregulated.

It's not reasonable to regulate stuff before it gets developed. Regulation means establishing some limits and controls on something, which can't be reasonably defined before that "something" even exists, much less tested or decided whether the regulation has whatever desired effects it intends.

For what is worth, a "theoretical regulation" already exists: it's the Asimov's Rules of Robotics. Turns out current AIs are not robots, and that regulation is nonsense when applied to stable diffusion or LLMs.

I disagree. Over the last twenty years or so we have plenty examples of things they should have been regulated from the start that weren’t, and now it’s very difficult to do so. Every “gig economy” business for example.

Well fleshbags have to pay several years worth of salary to get their education, so by your comparison, Google's AI should too.

Imagine thinking Public Education doesn't count. Or that no one without a college degree ever invented anything useful. That's before we get to your notion of "College SHOULD be expensive, for everyone, always".

The problem with education is NOT that some people pay less for theirs, or nothing at all, nor that some even have the audacity to learn quickly. AI could help everyone to have a chance to learn cheaply, even quickly.

You're just off on your own little rant now, arguing points I never even implied.

That's wrong on so many levels:

Go check the Gutenberg Project and the patent registry, come back when you've learned them all, they're 100% free for everyone.
Fleshbags have to pay for "dumbed down" educational material just to have a chance at learning anything during their lifespan, AIs don't.
The lion's share of "paying for education" isn't even paid for education, but for certification. AIs would have to pay the same... if any were dumb enough to spend "several years worth of salary" on some diploma.
The only part worth paying for, is "hands on experience", which right now is far more expensive for AIs (need simulations and robots built).
Training AIs already isn't free, they need thousands to millions of repetitions to learn the stuff, which means quite a buck in server costs.

So just because fleshbags are really bad at learning, does not mean Google's AI has to pay for the same shortcomings, they already pay for their own.

17 more...

Copyright law is gaslighting at this point. Piracy being extremely illegal but then this kind of shit being allowed by default is insane.

We really are living under the boot of the ruling classes.

If you want "this kind of stuff" (by which I assume you mean the training of AI) to not be allowed by default, then you are basically asking for a world in which the only legal generative AIs belong to giant well-established copyright holders like Adobe and Getty. That path leads deeper underneath the boots of those ruling classes, not out from under them.

I don't think it should be allowed to be trained off any of this stuff for entertainment/art/etc. at all. Like the dream future of AI was all the shitty boring stuff handled for us so we could sit back, chill and focus on arts, real scientific research, general individual betterment etc.

Instead we have these companies trying to get them doing all the art and interesting things whilst we all either have no job, money, or good standard of living, or the dangerous / shitty jobs.

So to avoid being "under the boot of the ruling classes" you want the government to be in charge of deciding what is and is not the correct way to produce our entertainment and art?

I use Stable Diffusiuon to generate illustrations for tabletop roleplaying game adventures that I run for my friends. I use ChatGPT to brainstorm ideas for those adventures and come up with dialogue or descriptive text. How big a fine would I be facing under these laws?

I mean there has to be a price to pay here, we can't have our cake and eat it unfortunately. Caveats like "individual use" could allow this type of use while prevent companies taking the piss.

You seem to be implying that the government is the ruling class too, which (I grant you) may at least in part be the case but at least they're voted into place. Would you rather have companies that we have no control over realistically use it without limit?

Honest question, what would you see as a fair way to handle the situation?

I mean there has to be a price to pay here,

Why, because you say so?

Would you rather have companies that we have no control over realistically use it without limit?

Yes, because that means I can also use it without limit. And I see no reason to apply special restrictions to AI specifically, companies are already bound by lots of laws governing their behaviour and ultimately it's their behaviour that is what's important to control.

Honest question, what would you see as a fair way to handle the situation?

Handle it the way we already handle it. People are allowed to analyze publicly available data however they want. Training an AI is just a special case of analyzing that data, you're using a program to find patterns in it that the AI is later able to make use of when generating new material.

Why, because you say so?

This is just being obtuse and a bit of a cunt. You can't expect not to have negative reprecusions as an affect of companies being allowed to just churn out as much AI generated shit as they can. Especially since you also say:

companies are already bound by lots of laws governing their behaviour and ultimately it's their behaviour that is what's important to control.

Please read what you've again but slowly this time. You're saying you're fine with all the other regulation, but it shouldn't be done here cause of individual liberties when i've clearly stated free use can be specifically allowed for here...

Yes, because that means I can also use it without limit.

You've again stated your problem when i've given a more than sensible solution. Individual free use is fine, why would anyone want to stop you, individually or even with your friends, being creative? The problems comes when companies with huge resources, influence, and nefarious motives decide to use it. How about this time we get ahead of it instead of letting things get out of control then trying to do something about it?

This is just being obtuse and a bit of a cunt.

No, I'm seriously asking. You said that there has to be a price to pay, but I really don't see why. Why can't people be free to do these things? It doesn't harm anyone else.

It's reasonable to create laws to restrict behaviour that harms other people, but that requires the person proposing those laws to show that this is actually the case. And that the restrictions placed by those laws are reasonable and proportionate, not causing more harm than they prevent.

Individual free use is fine, why would anyone want to stop you, individually or even with your friends, being creative? The problems comes when companies with huge resources, influence, and nefarious motives decide to use it.

There is no sharp dividing line between these things. What if one of the adventures I create turns out so good that I decide to publish it? What if it becomes the basis for a roleplaying system that becomes popular enough that I start a publishing company for it?

The problems comes when companies with huge resources, influence, and nefarious motives decide to use it.

How about if one of those huge companies just wants to produce some entertainment that will sell really well and that I would enjoy?

You're not really making an argument for banning AI, here. You're making an argument for banning nefariousness. That's fine, but that's kind of a bigger separate issue.

The ruling class is seeing the end of capitalism. They're getting desperate and making it obvious.

Can we get some young politicians elected who has a degree in IT ? Boomers dont understand technology that's why these companies keeps screwing the people.

It's because they're corrupt and young people are just as susceptible to lobbyists bribes, unfortunately. The gerontocracy doesn't make things better though, that's for sure.

True but that doesn't mean it wouldn't be better to have politicians who have a better understanding of the systems they're legislating. "People can be bribed" isn't a good excuse to not change anything.

Definitely, I didn't mean to sound too defeatist.

True. Human beings are the worst

This is more true than anything.

Personally I’d rather stop posting creative endeavours entirely than simply let it be stolen and regurgitated by every single company who’s built a thing on the internet.

I just take comfort in the fact that my art will never be good enough for a generative Ai to steal.

If it's on any major platform, these companies will probably still use it since I doubt at that point if they were allowed to scrape the whole internet they'd have any human looking over the art used.

It'll just be thrown in with everything else similar to how I always seem to find paper towels in the dryer after doing laundry.

Then I take comfort in the fact it might serve to sabotage whatever it generates.

"Bad" art is still useful in training these models because it can be illustrative of what not to do. When prompting image generators it's common to include "negative prompts" along with your regular one, telling the AI what sorts of things it should avoid putting in the output image. If I stuck "by Roundcat" into the negative prompts it would try to do things other than the things you did.

I think the topic is more complex than that.

Otherwise you could say you'd rather stop posting creative endeavours entirely than simply let it be stolen and regurgitated by every single artist who use internet for references and inspiration.

There's not only the argument "but companies do so for profit" because many artist do the same, maybe they are designers, illustrators or other and you'll work will give them ideas for their commissions

11 more...

Voluntary obscurity is always an option, I suppose.

We need to actively start sabotaging the data sources these LLMs are based on. Make AI worthless.

Your comment right here provides useful training data for LLMs that might use Fediverse data as part of their training set. How would you propose "sabotaging" it?

11 more...

Books will start needing to add a robots.txt page to the back of the book

Which will be ignored by search engines, as is tradition?

… which was the style at the time.

OK, so I shall create a new thread, because I was harassed. Why bother publishing anything if it's original if it's just going to be subsumed by these corporations? Why bother being an original human being with thoughts to share that are significant to the world if, in the end, they're just something to be sucked up and exploited? I'm pretty smart. Keeping my thoughts to myself.

This is a tendency I've heard that I haven't been able to understand. What is the new risk of expressing your thoughts, prose, or poetry online that didn't exist before and currently exists with LLMs scraping them? How would the corporations exploit your work through data scraping that would demotivate you to express it at all? Because I know tone doesn't come accross well in text, I want to clarify that these are genuine questions because my answers to these questions seem to be very different than many and I'd like to understand where that difference in perspective comes from.

I think this largely boils down to the time scales required. A person copying your work has a minimum amount of time it takes them to do that, even when it's just copy and paste. An LLM can copy thousands of different developer's code, for instance, and completely launder the license. That's not ok. Why would we allow machines to commit fraud when we don't allow people to?

Except that isn't exactly how neural networks learn. They aren't exactly copying work, they're learning patterns in how humans make those works in order to imitate them. The legal argument these companies are making is that the results from using AI are transformative enough that they qualify as totally new and unique works, and it looks as if that might end up becoming law, depending on how the lawsuits currently going through the courts turn out.

To be clear, technically an LLM doesn't copy any of the data, nor does it store any data from the works it learns from.

::: spoiler spoiler asdfasdfsadfasfasdf :::

Yes, they probably would, so long as the work is transformative enough. You wouldn't be the first, or last, author to copy LoTR in their own works.

This is why you can go on Instagram and find people selling presets that give photos the look of a famous photographer. They advertise them as such. But even though they are trying to sell something that supposedly allows you to copy the style of someone else, it's still legal, because it's transformative enough.

It doesn't have to make sense, and we don't have to agree with it, but that's how the law works.

The problem is if I wholesale copy a paragraph word for word, then yes, I am engaging in plagiarism. The line is not as clear as you think. The difference is I can’t hide what I took as well as AI can and I can’t do it to 10,000 people in an instant.

Just because I engage in plagiarism at scale and hide it better does not mean I did not engage in plagiarism.

Except, what it produces is very similar or identical to some copyrighted works, licensed under the LGPL, like in this case. You don't have to copy a whole program to plagiarize someone

This is very interesting for me to think about, since I have so many issues with proprietary technology in general. An LLM copying the code from thousands of proprietary projects is kind of an interesting loophole considering that it would be difficult for any of the individual businesses to prove that their proprietary code was infringed unless the LLM does copy and paste the code exactly. That could cause major changes in the tech industry which I'm not able to predict. Optimally I would like technological development more in the hands of people than behind legal barriers such as with Open Source code and I am not a programmer, so take my musings with a grain of salt.

With each day I hate the internet and these fucking companies even more.

Google can go suck on a lemon!

Lemons are delicious af though. Why reward them for their bs?

Worth considering that this is already the law in the EU. Specifically, the Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market has exceptions for text and data mining.

Article 3 has a very broad exception for scientific research: "Member States shall provide for an exception to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, and Article 15(1) of this Directive for reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access." There is no opt-out clause to this.

Article 4 has a narrower exception for text and data mining in general: "Member States shall provide for an exception or limitation to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, Article 4(1)(a) and (b) of Directive 2009/24/EC and Article 15(1) of this Directive for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining." This one's narrower because it also provides that, "The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online."

So, effectively, this means scientific research can data mine freely without rights' holders being able to opt out, and other uses for data mining such as commercial applications can data mine provided there has not been an opt out through machine-readable means.

I think the key problem with a lot of the models right now is that they were developed for "research", without the rights holders having the option to opt out when the models were switched to for-profit. The portfolio and gallery websites, from which the bulk of the artwork came from, didn't even have opt out options until a couple of months ago. Artists were therefore considered to have opted in to their work being used commercially because they were never presented with the option to opt out.

So at the bare minimum, a mechanism needs to be provided for retroactively removing works that would have been opted out of commercial usage if the option had been available and the rights holders had been informed about the commercial intentions of the project. I would favour a complete rebuild of the models that only draws from works that are either in the public domain or whose rights holders have explicitly opted in to their work being used for commercial models.

Basically, you can't deny rights' holders an ability to opt out, and then say "hey, it's not our fault that you didn't opt out, now we can use your stuff to profit ourselves".

Common sense would surely say that becoming a for-profit company or whatever they did would mean they've breached that law. I assume they figured out a way around it or I've misunderstood something though.

I think they just blatantly ignored the law, to be honest. The UK's copyright law is similar, where "fair dealing" allows use for research purposes (legal when the data scrapes were for research), but fair dealing explicitly does not apply when the purpose is commercial in nature and intended to compete with the rights holder. The common sense interpretation is that as soon as the AI models became commercial and were being promoted as a replacement for human-made work, they were intended to be a for profit competition to the rights holders.

If we get to a point where opt outs have full legal weight, I still expect the AI companies to use the data "for research" and then ship the model as a commercial enterprise without any attempt to strip out the works that were only valid to use for research.

So at the bare minimum, a mechanism needs to be provided for retroactively removing works that would have been opted out of commercial usage if the option had been available and the rights holders had been informed about the commercial intentions of the project.

If you do this, you limit access to AI tools exclusively to big companies. They already employ enough artists to create a useful AI generator, they'll simply add that the artist agrees for their work to be used in training to the employment contract. After a while, the only people who have access to reasonably good AI is are those major corporations, and they'll leverage that to depress wages and control employees.

The WGA's idea that the direct output of an AI is uncopyrightable doesn't distort things so heavily in favor of Disney and Hasbro. It's also more legally actionable. You don't name Microsoft Word as the editor of a novel because you used spell check even if it corrected the spelling and grammar of every word. Naturally you don't name generative AI as an author or creator.

Though the above argument only really applies when you have strong unions willing to fight for workers, and with how gutted they are in the US, I don't think that will be the standard.

The solution to only big companies having access to AI by using enough artists to create a useful generator isn't to deny all artists globally any ability to control their work, though. If all works can be scraped and added to commercial AI models without any payment to artists, you completely obliterate all artists except for the small handful working for Disney, Hasbro, and the likes.

AI models actually require a constant input of new human-made artworks, because they cannot create anything new or unique themselves, and feeding an AI content produced by AI ends up with very distorted results pretty quickly. So it's simply not viable to expect the 99% of artists who don't work for big companies to continuously provide new works for AI models, for free, so that others can profit from them. Therefore, artists need either the ability to opt out or they need to be paid.

(The word "artist" here is used to refer to everyone in the creative industries. Writing and music are art just like paintings and drawings are.)

Unfortunately, copyright protection doesn't extend that far. AI training is almost certainly fair use if it is copying at all. Styles and the like cannot be copyrighted, so even if an AI creates a work in the style of someone else, it is extremely unlikely that the output would be so similar as to be in violation of copyright. Though I do feel that it is unethical to intentionally try to reproduce someone's style, especially if you're doing it for commercial gain. But that is not illegal unless you try to say that you are that artist.

Copyright law on this varies, actually! In the UK, "fair dealing" actually has an exclusion for using copyrighted material for the purpose of commercially competing with the creator. This also includes derivative works. This does therefore cover style to a certain extent, because works imitating a style of an artist are generally intended to commercially compete with them. From that perspective, taking an artist's entire portfolio, feeding it into an AI, and producing work in their style at a lower price than the artist does (because an AI produces something in seconds which takes the artist weeks), is pretty obviously an attempt to compete with the artist commercially.

While people like to draw comparisons between AIs and humans copying another artist's style, the big difference here is that a human artist needs to spend hundreds of hours learning to imitate another artist's style, at the expense of developing their own style, while the original artist is also continually developing their style. It is bloody hard to imitate another human's art style. But an AI can do it in minutes, and I haven't yet seen any valid arguments for how that's not intended to commercially compete with human artists on a massive scale.

True, I wrote this from a US law perspective, where that kind of behavior is expressly protected. US law is also written specifically to protect things like search engines and aggregators to prevent services like Google from getting sued for their blurbs, but it's likely also a defense for AI.

Regardless of if it should be illegal or not, I feel that AI training and use is currently legal under current US law. And as a US company, dragging OpenAI to UK courts and extracting payment from them would be difficult for all but the most monied artists.

For the moment, US companies do actually care what the UK courts and regulatory bodies say, because the trifecta of US-UK-EU is what tends to form a base of what the rest of the world decides. It's why Microsoft have been so unhappy about the UK's Competition and Markets Authority initially blocking the merger with Blizzard: even with the US and EU antitrust bodies agreeing to it, it did actually matter if the UK didn't agree (I am so disappointed in the CMA finally capitulating). And some of the lawsuits against the AI companies are taking place in the UK courts, with no indications that the AI companies are refusing to engage. Obviously at this point it's hard to say what the outcome will be, but the UK legal system does actually have enough clout globally that it won't be a meaningless result.

Practically you would have to separate model architecture from weights. Weights are licensed as research use only, while the architecture is the actual scientific contribution. Maybe some instructions on best train the model.

Only problem is that you can't really prove if someone just retrained research weights or trained from scratch using randomized weights. Also certain alterations to the architecture are possible, so only the "headless" models are used.

I think there's some research into detecting retraining, but I can imagine it's not fool proof.

I kind of think that as proof-of-concepts, the AI models are kind of interesting. I don't like the content they produce much, because it is just so utterly same-y, so I haven't yet seen anything that made me go "wow, that's amazing". But the actual architecture behind them is pretty cool.

But at this point, they've gone beyond researching an interesting idea into full on commercial enterprises. If we don't have an effective means of retraining the existing models to remove the data that isn't licenced for commercial use (which is most of it), then it seems the only ethical way to move forward would be to start again with more selective training data, including only what is commercially licenced. Now the research has been done in how to create these models, it should be quicker to build new ones with more ethically sourced training data.

The standard needs to be opt-in, not opt-out. You can't take people's stuff without their permission. Just because they didn't contact you and tell you directly that you're not allowed to take their lawn ornaments doesn't make them free.

Why not? Copyright is a monopoly. Generally society benefits from having it as weak as possible.

This is like the beginning of a Hitchhiker's Guide to the Galaxy, where they put the responsibility on the main character to go to the department of transportation basement and see that they had posted a notice that they're going to destroy his house. No Google, you don't get to dictate that people come to your dark pattern website and tell you you're not allowed to use their content. Disapproval is implied until people OPT-IN! It's a good thing Google changed their motto from Don't Be Evil or we'd have quite the conundrum.

🤖 I'm a bot that provides automatic summaries for articles: ::: spoiler Click here to see the summary The company has called for Australian policymakers to promote “copyright systems that enable appropriate and fair use of copyrighted content to enable the training of AI models in Australia on a broad and diverse range of data, while supporting workable opt-outs for entities that prefer their data not to be trained in using AI systems”.

The call for a fair use exception for AI systems is a view the company has expressed to the Australian government in the past, but the notion of an opt-out option for publishers is a new argument from Google.

Dr Kayleen Manwaring, a senior lecturer at UNSW Law and Justice, told Guardian Australia that copyright would be one of the big problems facing generative AI systems in the coming years.

“The general rule is that you need millions of data points to be able to produce useful outcomes … which means that there’s going to be copying, which is prima facie a breach of a whole lot of people’s copyright.”

“If you want to reproduce something that’s held by a copyright owner, you have to get their consent, not an opt out type of arrangement … what they’re suggesting is a wholesale revamp of the way that exceptions work.”

Toby Murray, associate professor at the University of Melbourne’s computing and information systems school, said Google’s proposal would put the onus on content creators to specify whether AI systems could absorb their content or not, but he indicated existing licensing schemes such as Creative Commons already allowed creators to mark how their works can be used. :::

Google is smoking that pack.

Me, twenty years ago: i wish the word web 2.0 could disappear forever

Me, like 8 years ago: i wish the word web 3.0 could disappear forever

Monkeys paw: 👆

AI, coming crashing through the window and blindsiding me upside the head: surprise bitch

If my data is worth scraping then it is worth Google paying me for it.

How exactly did books.google.com turn out ?