AI rule

Masimatutu@lemm.ee to 196@lemmy.blahaj.zone – 1344 points –
192

You are viewing a single comment

I agree, but only if it goes both ways. We should be allowed to use big corpo's IPs however we want.

You are allowed to use copyrighted content for training. I recommend reading this article by Kit Walsh, a senior staff attorney at the EFF if you haven't already. The EFF is a digital rights group who most recently won a historic case: border guards now need a warrant to search your phone.

I know. AI is capable of recreating many ideas it sees in the training data even if it doesn't recreate the exact images. For example, if you ask for Mario, you get Mario. Even if you can't use these images of Mario without committing copyright infringement, AI companies are allowed to sell you access to the AI and those images, thereby monetizing them. What I am saying is that if AI companies can do that, we should be allowed to use our own depictions of Mario that aren't AI generated however we want.

AI companies can sell you Mario pics, but you can't make a Mario fan game without hearing from Nintendo's lawyers. I think you should be allowed to.

The comparision doesn't work. Because the AI is replacing the pencil or other drawing tool. And we aren't saying pencil companies are selling you Mario pics because you can draw a Mario picture with a pencil either. Just because the process of how the drawing is made differs, doesn't change the concept behind it.

An AI tool that advertises Mario pcitures would break copyright/trademark laws and hear from Nintendo quickly.

Except that you interact with the "tool" in pretty much the same way you'd interact with a human that you're commissioning for art minus, a few pleasantries. A pencil doesn't know how to draw Mario.

AI tools implicitly advertise Mario pictures because you know that:

  1. The AI was trained on lots of images, including Mario.
  2. The AI can give you pictures of stuff it was trained on.

An animation studio commissioned to make a cartoon about Mario would still get in trouble, even if they had never explicitly advertised the ability to draw Mario.

I don't think how you interact with a tool matters. Typing what you want, drawing it yourself, or clicking through options is all the same. There are even other programs that allow you to draw by typing. They are way more difficult but again, I don't think the difficulty matters.

There are other tools that allow you to recreate copyrighted material fairly easily. Character creators being on the top of the list. Games like Sims are well known for having tons of Sims that are characters from copyrighted IP. Everyone can recreate Barbie or any Disney Princess in the Sims. Heck, you can even download pre made characters on the official mod site. Yet we aren't calling out the Sims for selling these characters. Because it doesn't make sense.

Just so we're clear, my position is that it should all be okay. Copyright infringement by copying ideas is a bullshit capitalist social construct.

I don't buy the pencil comparison. If I have a painting in my basement that has a distinctive style, but has never been digitized and trained upon, I'd wager you wouldn't be able to recreate neither that image nor it's style. What gives? Because AI is not a pencil but more like a data mixer you throw complete works in into and it spews out colllages. Maybe collages of very finely shredded pieces, to the point you could even tell, but pieces of original works nontheless. If you put any non-free works in it, they definitely contaminate the output, and so the act of putting them in in the first place should be a copyright violation in itself. The same as if I were to show you the image in question and you decided to recreate it, I can sue you and I will win.

That is a fundamental misunderstanding of how AI works. It does not shred the art and recreate things with the pieces. It doesn't even store the art in the algorithm. One of the biggest methods right now is basically taking an image of purely random pixels. You show it a piece of art with a whole lot of tags attached. It then semi-randomly changes pixel colors until it matches the training image. That set of instructions is associated with the tags, and the two are combined into a series of tiny weights that the randomizer uses. Then the next image modifies the weights. Then the next, then the next. It's all just teeny tiny modifications to random number generation. Even if you trained an AI on only a single image, it would be almost impossible for it to produce it again perfectly because each generation starts with a truly (as truly as a computer can get, an unweighted) random image of pixels. Even if you force fed it the same starting image of noise that it trained on, it is still only weighting random numbers and still probably won't create the original art, though it may be more or less undistinguishable at a glance.

AI is just another tool. Like many digital art tools before it, it has been maligned from the start. But the truth is what it produces is the issue, not how. Stealing others' art by manually reproducing it or using AI is just as bad. Using art you're familiar with to inspire your own creation, or using an AI trained on known art to make your own creation, should be fine.

As a side note because it wasn't too clear from your writing, but the weights are only tweaked a tiny tiny bit by each training image. Unless the trainer sees the same image a shitload of times (Mona Lisa, that one stock photo used to show off phone cases, etc) then the image can't be recreated by the AI at all. Elements of the image that are shared with lots of other images (shading style, poses, Mario's general character design, etc) could, but you're never getting that one original image or even any particular identifiable element from it out of the AI. The AI learns concepts and how they interact because the amount of influence it takes from each individual image and its caption is so incredibly tiny but it trains on hundreds of millions of images and captions. The goal of the AI image generation is to be able to create vast variety of images directed by prompts, and generating lots of images which directly resemble anything in the training set is undesirable, and in the field it's called over-fitting.

Anyways, the end result is that AI isn't photo-bashing, it's more like concept-bashing. And lots of methods exist now to better control the outputs, from ControlNet, to fine-tuning on a smaller set of images, to Dalle-3 which can follow complex natural language prompts better than older methods.

Regardless, lots of people find that training generative AI using a mass of otherwise copyrighted data (images, fan fiction, news articles, ebooks, what have you) without prior consent just really icky.

You show it a piece of art with a whole lot of tags attached. It then semi-randomly changes pixel colors until it matches the training image. That set of instructions is associated with the tags, and the two are combined into a series of tiny weights that the randomizer uses. Anyways, the end result is that AI isn’t photo-bashing, it’s more like concept-bashing

That's what I've meant by "very finely shredded pieces". Ioversimplifed it, yes. But what I mean is that it's not literally taking a pixel off an image and putting it into output. But that using the original image in any way is just copying with extra steps.

Say, we forego AI entirely and talk real world copyright. If I were to record a movie theater screen with a camcorder, I would commit copyright infringement, even though it's transformed by my camera lens. Same as If I were to distribute the copyrighted work in a ZIP file, invert colors, or trace every frame and paint it with watercolors.

What if I was to distribute the work's name alongside it's SHA-1 hash? You might argue that such transformation destroys the original work and can no longer be used to retrieve the original and therefore should be legal. But, if that was the case, torrent site owners could sleep peacefully knowing that they are safe from prosecution. Real world has shown that it's not the case.

Now, what if we take some hashing function and brute force the seed until we get one which outputs the SHA-1's of certain works given their names. That'd be a terrible version of AI, acting exactly like an over-trained model would: spouting random numbers except for works it was "trained" upon. Is distributing such seed/weight a copyright violation? I'd argue that'd be an overly complicated way to conceal piracy, but yes, it would be. Because those seeds/weights are are still a based on the original works, even if not strictly a direct result of their transformation.

Anyways, the end result is that AI isn’t photo-bashing, it’s more like concept-bashing

Copying concepts is also a copyright infringement, though

Regardless, lots of people find that training generative AI using a mass of otherwise copyrighted data (images, fan fiction, news articles, ebooks, what have you) without prior consent just really icky.

It shouldn't be just "icky", it should be illegal and be prosecuted ASAP. The longer it goes on like this, the more the entire internet is going to be filled with those kind-of-copyrighted things, and eventually turn into a lawsuit shitstorm.

Heads up, this is a long fucking comment. I don't care if you love or hate AI art, what it represents, or how it's trained. I'm here to inform, refine your understanding of the tools (and how exactly that might fit in the current legal landscape), and nothing more. I make no judgements about whether you should or shouldn't like AI art or generative AI in general. You may disagree about some of the legal standpoints too, but please be aware of how the tools actually work because grossly oversimplifying them creates serious confusion and frustration when discussing it.

Just know that, because these tools are open source and publically available to use offline, Pandora's box has been opened.

copying concepts is also copyright infringement

Except it really isn't in many cases, and even in the cases where it could be, there can be rather important exceptions. How this all applies to AI tools/companies themselves is honestly still up for debate.

Copyright protects actual works (aka "specific expression"), not mere ideas.

The concept of a descending blocks puzzle game isn't copyrighted, but the very specific mechanics of Tetris are copyrighted. The concept of a cartoon mouse isn't copyrighted, but mickey mouse's visual design is. The concept of a brown haired girl with wolf ears/tail and red eyes is not copyrighted, but the exact depiction of Holo from Spice and Wolf is (though that's more complicated due to weaker trademark and stronger copyright laws in Japan). A particular chord progression is not copyrightable (or at least it shouldn't be) but a song or performance created with it is.

A mere concept is not copyrightable. Once the concept is specific enough and you have copyrighted visual depictions of it, then you start to run more into trademark law territory and start to gain a copyright case. I really feel like these cases are kinda exceptions though, at least for the core models like stable diffusion itself, because there's just so much existing art (both official and even moreso copyright/trademark infringing fan art) of characters like Mickey Mouse anyways.

The thing the AI does is distill concepts and interactions between concepts shared between many input images, and can do so in a generalized way that allows concepts never before seen together to be mixed together easily. You aren't getting transformations of specific images out of the AI, or even small pieces of each trained image, you're instead getting transformations of learned concepts shared across many many many works. This is why the shredding analogy just doesn't work. The AI generally doesn't, and is not designed to, mimic individual training images. A single image changes the weights of the AI by such a miniscule amount, and those exact same weights are also changed by many other images the AI trains on. Generative AI is very distinctly different from tracing, or distributing mass information that's precisely specific enough to pirate content, or from transforming copyrighted works to make them less detectable.

To drive the point home, I'd like to expand on how the AI and its training is actually implemented, because I think that might clear some things up for anyone reading. I feel like the actual way in which the AI training uses images matters.

A diffusion model, which is what current AI art uses, is a giant neural network that we want to guess the noise pattern of an image. To train it on an image, we add some random amount of noise to the whole image (could be a small amount like film grain, or it could be enough to make the image completely noise, but it's random each time), then pass that image and its caption through the AI to get the noise pattern the AI guesses is in the image. Now we take the difference between the noise pattern it guessed and the noise pattern we actually added to the training image to calculate the error. Finally, we tweak the AI weights based on that error. Of note, we don't tweak the AI to perfectly guess the noise pattern or reduce the error to zero, we barely tweak the AI to guess ever so slightly better (like, 0.001% better). Because the AI is never supposed to see the same image many times, it has to learn to interpret the captions (and thus concepts) provided alongside each image to direct its noise guesses. The AI still ends up being really bad at guessing high noise or completely random noise anyways, which is yet another reason why it can't generally reproduce existing trained images from nothing.

Now let's talk about generation (aka "inference"). So we have an AI that's decent at guessing noise patterns in existing images as long as we provide captions. This works even for images that it didn't train on. That's great for denoising and upscaling existing images, but how do we get it to generate new unique images? By asking it to denoise random noise and giving it a caption! It's still really shitty at this though, the image just looks like some blobby splotches of color with no form, else it probably wouldn't work at denoising existing images anyways. We have a hack though: add some random noise back into the generated image and send it through the AI again. Every time we do this, the image gets sharper and more refined, and looks more and more like the caption we provided. After doing this 10-20 times we end up with a completely original image that isn't identifiable in the training set but looks conceptually similar to existing images that share similar concepts. The AI has learned not to copy images while training, but actually learned visual concepts. Concepts which are generally not copyrighted. Some very specific depictions which it learns are technically copyrighted, i.e. Mickey Mouse's character design, but the problem with that claim too is that there are fair use exceptions, legitimate use cases, which can often cover someone who uses the AI in this capacity (parody, educational, not for profit, etc). Whether providing a tool that can just straight up allow anyone to create infringing depictions of common characters or designs is legal is up for debate, but when you use generative AI it's up to you to know the legality of publishing the content you create with it, just like with hand made art. And besides, if you ask an AI model or another artist to draw Mickey mouse for you, you know what you're asking for, it's not a surprise, and many artists would be happy to oblige so long as their work doesn't get construed as official Disney company art. (I guess that's sorta a point of contention about this whole topic though isn't it? If artists could get takedowns on their mickey mouse art, why wouldn't an AI model get takedowns too for trivially being able to create it?)

Anyways, if you want this sort of training or model release to be a copyright violation, as many do, I'm unconvinced current copyright/IP laws could handle it gracefully, because even if the precise method by which AI's and humans learn and execute is different, the end result is basically the same. We have to draw new more specific lines on what is and isn't allowed, decide how AI tools should be regulated while taking care not to harm real artists, and few will agree on where the lines should be drawn.

Also though, Stable Diffusion and its many many descendents are already released publicly and open source (same with Llama for text generation), and it's been disseminated to so many people that you can no longer stop it from existing. That fact doesn't give StabilityAI a pass, nor do other AI companies who keep their models private get a pass, but it's still worth remembering that Pandora's box has already been opened.

The problem is that you might technically be allowed to, but that doesn't mean you have the funds to fight every court case from someone insisting that you can't or shouldn't be allowed to. There are some very deep pockets on both sides of this.

That's not it going both ways. You shouldn't be allowed to use anyone's IP against the copyright holders wishes. Regardless of size.

Nah all information should be freely available to as many people as practically possible. Information is the most important part of being human. All copyright is inherently immoral.

I'd agree with you if people didn't need to earn money to live. You can't enact communism by destroying the supporters of it. I fully support communism in theory, we should strive for a community based government. I'm also a game developer and when I make things I need to be able to pay my bills because we still live in capitalism.

If copyright was vigorously enforced a lot more people would starve than would be fed

It's already vigorously enforced. Maybe you can expand on what you mean.

Go on etsy search mickey mouse and go report all the hundreds of artists whose whole career is violating the copyright of the most litigious company on the planet.

As long as I'm not pretending to be Nintendo, can you quantify how exactly releasing, for example, a fan Mario game is unethical? It having the potential to hurt their sales if you make a better game than them doesn't count because otherwise that would imply that out-competing anyone in any market must be unethical, which is absurd.

No, it's about if you make a game that's worse or off-brand. If you make a bunch of Mario games into horror games and then everyone thinks of horror when they think of Mario then good or bad, you've ruined their branding and image. Equally, if you make a bunch of trash and people see Mario as just a trash franchise (like how most people see Sonic games) then it ruins Nintendo's ability to capitalize on their own work.

No one is worried about a fan-made Mario game being better.

Wouldn't that only be a problem if you pretended to be Nintendo? There are a lot of fan Sonic games and I don't think it's affected Sonic's image very much. There's shit like Sonic.EXE, which is absolutely a horror game, but people still don't think of horror when they think of Sonic. The reason Sonic is a trash franchise is because of SEGA consistently releasing shitty Sonic games. Hell, some of the fan games actually do beat out the official games in quality and polish.

Hypothetically, assuming you're right and fan games actually do hurt their branding, there's still no reason for it to be illegal. We don't ban shitty mobile games for giving good mobile games a bad rep, and neither Nintendo nor mobile devs can claim libel or defamation.

Actually we do. If your game is so terrible, it can get removed from the Google Play store. It has to be really bad and they rarely do it. Steam does this as well. Epic avoids this by vetting the games they put on their platform site first. It's why the term asset flip exists.

Ban in a legal sense, not ban from a proprietary store.

Sure, that's fair. Either way bad or good, it's illegal to take someone's existing IP and add on to it. Good or bad. No one is legally banning games based on quality and if they were good or bad it doesn't matter. It matters that the original story owner has a vision and they have the right to make money off of their work without other people trying to take or add on to the story themselves. Brand fatigue is a real thing and while all of the CoD games are great and fairly high quality, the reason they get a bad rap is exactly brand fatigue. The Assassin's Creed games were the same for a bit but they resolved that by spreading releases out.

Either way, the arguments to let people just add on to what they want seems to fall flat. Why not just build a universe like Mario and call it Maryo? Or the great giana sisters? Why involve someone else's IP at all? because you are profiting off of the popularity that someone else built with quality products. In even a communist society I'd want that banned because it's lazy, misrepresents the original vision, and overall it's completely avoidable without a problem. In fact, why not force people to create new things instead of letting them be lazy and stealing the popularity of a well-made IP?

Brand fatigue is definitely a real thing, I will agree with you there. In the same boat, so is genre fatigue. If you play a lot of platformers, you'll eventually get bored and move on to something else. Nintendo clearly has no right to go after other platforming games that have copied from Mario's mechanics, so why should they have the right to go after games that use Mario's aesthetic or name?

Fan games are not normally morally wrong, but I do think it's kind of trashy to try and make money off of someone else's brand if you're not doing it out of passion. I just don't see it as a legal problem, much like how crappy off-brand ripoffs and lazily made games aren't a legal problem. It's also worth noting that the people making the most money off of the IP are just executives that likely had little to no part in creating the characters and care less about appreciating the artistic work than most fan game makers.

Look at Sonic P-06, a fan-made remake of Sonic '06. It's highly polished, has fleshed out features that never made it into the original game, and is absolutely a labor of love. The original came out a buggy pile of garbage because it was forced out by those aforementioned business-people before it was ready. Most companies' strict protection of their IP just prevent works of art like P-06 from seeing the light of day. I think that SEGA's no Sonic monetization policy being enshrined into law would be a reasonable compromise.

Nintendo clearly has no right to go after other platforming games that have copied from Mario’s mechanics, so why should they have the right to go after games that use Mario’s aesthetic or name?

This argument doesn't hold water to me. How are you conflating a brand with all games in a genre? IP and brands are a signifier of what to expect. Sherlock Holmes is a great example of what used to be an IP known for a very specific style of mystery story but now just means "genius problem solver maybe with a drug habit."

Fan games are not normally morally wrong, but I do think it’s kind of trashy to try and make money off of someone else’s brand if you’re not doing it out of passion. I just don’t see it as a legal problem

We are going to just have to agree to disagree. Spiritual successors happen all the time. The reason people make fan games is that there is a lexicon built into their project already. It's a shortcut instead of building (and considering) what is meaningful to your game in a lexicon. Additionally, a lot of people do not consider how their changes affect the lexicon throughout all games. So what you are left with is mostly people who don't truly understand, never talked to the creators, never worked with them, assume everything from a product perspective, pushing out something that adds to the brand without any true coherency or consideration of future titles.

I for one see it as wrong to attempt to take someone's work and not only pass it off as your own but also potentially break their ability to make future iterations of their work.

Look at Sonic P-06, a fan-made remake of Sonic ‘06. It’s highly polished, has fleshed out features that never made it into the original game, and is absolutely a labor of love.

Perhaps a labor of love. From my initial parsing of reviews, looks like they didn't attempt to change anything but kept close to the source material.

Most companies’ strict protection of their IP just prevent works of art like P-06 from seeing the light of day

I feel like that's okay. There are plenty of original and better ideas out there. It doesn't prevent things from being made. P-06 being "still terrible but better" doesn't really say anything. The fact that games like Black Mesa, Sonic Media, Skywind, and very specifically Zera: Myths Awaken also exists, a game that started as Spyro 4 and Activision sent a C&D, so they rebranded. These things are easily fixed and honestly, it's great if fans want to attempt to build a game off of someone's IP then ask the IP holder if they can continue with it. If not, they can just rebrand and create their own universe. Temtem is a great example of what happens when people are forced to create their own thing. It becomes more impactful and allows for a more interesting product. Not just a cookie cutter game.

I get what you're saying. The problem is that you can't argue your case just by giving examples of how IP enforcement can lead to people innovating more in the way that I can. The debate is asymmetrical because all I have to do is show that IP violation doesn't uniquely negatively affect IP holders in ways that legal activities can't, demonstrating precedent for that kind of competition/harm being legal. You need to justify forcibly imposing limitations on what people are allowed to do, which has a higher "burden of proof" if you get what I'm saying.

I don't think that's all you have to show. In fact I feel like a lot of your examples have missed the point entirely. The point isn't that there are other ways that could maybe impact the sales or recognition a game gets. This conversation also jumps between copyright and trademark protections. Copyright is about protecting actual assets. You take my assets and make something else with them without my consent, you've stolen my work. That's a bad, immoral thing. Equally ties to the copyright of characters and story. You take the building blocks I've made, you've stolen my work. Trademark is about protecting brand recogniztion and deals with IP violations.

With that covered, the point of copyright is to ensure the people who did the work get paid for their work. That large or small companies don't have to worry about someone stealing their works and allowing them to innovate.

So when you say, whatever people will just make another game in the genre, that's innovation. What if it's bad? Not everything is good, that's still positive innovation. You always mentioned large IPs but the truth is that the law is equal and that large corporations are already taking people's work without asking. We do not need more of that.

For fan made games, there isn't a huge point to do them without the blessing of the IP holder and you might point to large studios vs small fans but think of it in every scale. Especially middle to small studios being stolen from. It's just a fan game doesn't really hold water when it's potentially putting people out of business because of the issues I've already shown. Imagine if copyright wasn't enforced and someone just re-uploaded existing games. Where do you draw a line on the charges a fan game needs to make in order to qualify. I can tell you right now 99% of players wouldn't buy a game on steam if there was another fan game of it exactly but for free. So then you have to draw up all these lines that are frankly unfair to the creators. So just let them choose. Their works belong to them. Not to just people who might like the game.

So I don't see any point for which a looser copyright law would be overall more helpful to society. We need courts to allow for smaller creators to justify fair use but that doesn't cover anything we talked about.

Where current copyright seems to draw the line is on recognizable characters. Where I draw the line is on direct asset or code flipping. I can't see any functional difference between saying you own, for example, the platformer genre VS saying you own short, plump plumbers in red caps named Mario. I think we might fundamentally disagree in an irreconcilable way there.

1 more...
1 more...
1 more...
1 more...
1 more...
1 more...
1 more...
1 more...
1 more...
1 more...
1 more...
1 more...
1 more...