Assassin's Creed Voice Actor Calls AI-Generated Mods the 'Invisible Enemy We're Fighting Right Now' - IGN

stopthatgirl7@kbin.social to Gaming@kbin.social – 0 points –
Assassin's Creed Voice Actor Calls AI-Generated Mods the 'Invisible Enemy We're Fighting Right Now' - IGN
ign.com

Victoria Atkin — who played Evie Frye in 2015's Assassin's Creed syndicate — tells IGN how the video game industry needs to change to protect its performers.

9

"I've done so many games I've lost count, and there's so much of my voice out there that I would never be able to keep track of… There's credits that are not even on my IMDb that I've done. It's just frightening… It's kind of dangerous what they can do with it without my say."

This isn't really relevant to the larger point of the article, but a technical nitpick: I seriously doubt that anyone wants to generate voice that sounds like a voice actor so much as a character in a specific game. It's not that someone's likely going to take all the voice acting for different characters and produce some aggregate from that.

Take Mel Blanc. He's a famous voice actor.

https://en.wikipedia.org/wiki/Mel\_Blanc

He voiced Bugs Bunny, Foghorn Leghorn, and Barney Rubble, among others.

There is not now and I suspect will not be for the forseeable future some kind of useful voice model that you're going to get that involves merging information from the voices of those three characters.

Purely theoretically, okay, yeah, you could maybe statistically infer some data about the physical characteristics of the speaker that spans multiple characters, like I don't know, the size of vocal cords. Though I suspect that post-processing specific to individual characters probably mucks with even that. Most of what defines those characters is character-specific. The amount of useful information that you can derive across characters is gonna be pretty limited.

So I doubt that the number of different works has much impact on the accuracy of a voice model.

I'll also add that my experience playing around with Tortoise TTS and from what I've seen of "voice cloning" online services suggests that the training set size for a new voice doesn't need to be all that large, that the kind of information that they can use to learn about a voice doesn't presently extend much beyond the information present in a relatively small training set size.

https://github.com/neonbjb/tortoise-tts

Cut your clips into ~10 second segments. You want at least 3 clips. More is better, but I only experimented with up to 5 in my testing.

Now, I will believe that maybe that's a limitation of Tortoise TTS, and that a future, more-sophisticated generative AI could find useful data spanning larger datasets -- I recall once seeing someone British complaining that Tortoise TTS tended to make American-sounding voices, I presume because it was trained on American speakers -- but as things stand, I don't think that the difference between many hours of speech and a relatively small amount has a massive impact. That is, most of the useful information comes from the model's training on pre-existing voices, and the new voice mostly determines where the new voice lies relative to those.

I'd call AI-generated mods one of the best applications for AI-generated voice samples. It is extremely unlikely that a fan-made mod is going to ever get the original voice actor onboard, and without those voice actors, any mod necessarily cannot fit in with the rest of the game.

We could have a world where modding just doesn't happen, in general. But if we're going to have mods, that voice synth makes it practical to extend games that otherwise could not realistically be extended by third parties in a seamless way.

It's possible to make textures that fit in with original environments, or to model things that do so. Or to write text. But people are pretty good at distinguishing between voices, and so without the ability to do computer-synthesized voices for mods, one can't really create modified speech for existing characters.

I'll also add that I'm skeptical that at least the US is going to treat AI models trained on something as intrinsically creating copyright-infringing derivative works, though I don't know for sure what the EU will do. However, even if one assumes that some jurisdiction does decide to treat models as a derivative work, there's a fairly straightforward way to continue to distribute mods that I would expect should remain legal, and has happened in the past to avoid distributing copyrighted assets: distribute them as a patch against the original work.

It is legal for the end user to modify a copyrighted work that he owns. So if I distribute a patch that takes in Voice Actor X's base-game audio as an input and then takes them as input to generate new ones, well, that's not a legal problem for copyright. Copyright only deals with distribution from one person to another. I can create all the derivative works I want myself -- as the end user -- as long as I don't myself distribute them.

In fact, while it's probably not a very CPU-efficient way to distribute it -- going to waste the world's electricity, do another Bitcoin -- one approach might be to just distribute Tortoise TTS or whatever it is that people are using to generate the audio, as well as any marked-up text to regenerate, then just have the regeneration run on the end-user's computer to generate the mod using the original voice assets. Tortoise TTS has expensive generation, but unlike, say, Stable Diffusion, where the training process requires a lot of computational capacity, has very rapid training time on a new voice. Would be bandwidth-efficient, at any rate.

But point is, that is unquestionably legal, and still winds up in a place where the end user has the mod with the same new voice data on his computer.

And given that, I don't really see the point in trying to prohibit distributing the AI-generated speech files, from the standpoint of someone who is trying to block someone from playing a mod for a game using AI generated voice, because that player is going to wind up in basically the same place regardless of which route they take. It's maybe marginally more-obnoxious to take the full regeneration route, maybe have to run the "regenerate the mod voices" process overnight, but it's not going to generally stop the player from getting and playing the mod.