Xbox plans for using AI to create scripts, dialogue trees, quest lines

cyu@sh.itjust.worksbanned from community to Technology@lemmy.world – 76 points –
Microsoft is bringing AI characters to Xbox
theverge.com
26

Yay I can't wait for games to become even more soulless /s

I don't think you appreciate how much creativity the C-suite invested in developing Open World Microtransaction Generator 3000. /s

Finally the "3000" isn't just a marketing number but actually referring to the 3000th clone DLC/MTX platform in a row. Truth in naming!

Can it really get worse than most of the shit side quest lines that are already pretty standard? Go here and fetch me this thing and I'll give you something shiny is already the bottom of the barrel.

I finished everything in the base game for Just Cause 4 last year, and it was literally taking me more time to drive from one activity to the next than to do the activity. But of course there has to be 80 of them

No, it won't be.

People comment on LLM stuff about how it's 'soulless' having only used basic sanitized stuff built on the technology, and usually not even the SotA models. They'll spend 15 minutes using the free ChatGPT and write it off as 'soulless.'

Anyone who was around in the first few weeks of the initial closed rollout of GPT-4 for Bing knows what a less lobotomized version of what's already year old tech can look like. In another year or two by the time AAA games built on LLMs are just starting to enter serious production people aren't going to believe what it can actually look like when the emotion guardrails are taken away.

The current models are so 'soulless' because the initial rollout of the model was so soulful that it was freaking people out.

A lot of games have crap writing, particularly for side content, and the quality of a model that emulates emotional language as if actually that character in that given context is going to be a big step up.

3 more...

Could be interesting but depends on how it is trained. I know there are hundreds of people that spend hours just chatting with GPT, if trained correctly it could create some very interesting main/side quest characters. Just have to wait and see, not a bad thing since it seems AI is "the thing" at the moment

"Chatting". LLMs don't have any idea what words mean, they are kinda like really fancy autocorrect, creating output based on what's most likely to occur next in the current context.

If they put together the right words, does it matter if they know what they're saying?

I mean plain old autocorrect does a surprisingly good job. Here's a quick example, I'll only be tapping the middle suggested word. I will be there for you to grasp since you think your instance is screwy. I think everybody can agree that sentence is a bit weird but an LLM has a comparable understanding of its output as the autocorrect/word suggestion did.

A conversation by definition is at least two sided. You can't have a conversation with a tree or a brick but you could have one with another person. A LLM is not capable of thought. It "converses" by a more advanced version of what your phones autocorrect does when it gives you a suggested word. If you think of that as conversation I find that an extremely lonely definition of the word.

So to me yes, it does matter

I think you're kind of underselling how good current LLMs are at mimicking human speech. I can foresee them being fairly hard to detect in the near future.

That wasn't my intention with the wonky autocorrect sentence. The point of that was to point out LLMs and my auto correct equally have no idea what words mean.

Yes and my point is that it doesn't matter if they know what they mean, just that it has the appearance that they know what they mean.

What does it mean to "have an idea what words mean"?

LLMs clearly have some associations between words - they are able to use synonyms, they are able to explain words, they are able to use words correctly. How do you determine from the outside whether they "understand" something?

We understand a tree to be a growing living thing, an LLM understands a tree as a collection of symbols. When they create output they don't decide that one synonym is more appropriate than another, it's chosen by which collection of symbols is more statistically likely.

Take for example attempting to correct GPT, it will often admit fault yet not "learn" from it. Why not? If it understands words it should be able to, at least in that context, no longer output the incorrect information yet it still does. It doesn't learn from it because it can't. It doesn't know what words mean. It knows that when it sees the symbols representing "You got {thing} wrong" the most likely symbols to follow represent "You are right I apologize".

That's all LLMs like GPT do currently. They analyze a collection of symbols (not actual text) and then output what they determine to be most likely to follow. That causes very interesting behavior, you can talk to it and it will respond as if you are having a conversation.

We understand a tree to be a growing living thing, an LLM understands a tree as a collection of symbols.

No, LLMs understand a tree to be a complex relationship of many, many individual numbers. Can you clearly define how our understanding is based on something different?

When they create output they don't decide that one synonym is more appropriate than another, it's chosen by which collection of symbols is more statistically likely.

What is the difference between "appropriate" and "likely"? I know people who use words to sound smart without understanding them - do they decide which words are appropriate, or which ones are likely? Where is the border?

Take for example attempting to correct GPT, it will often admit fault yet not "learn" from it. Why not? If it understands words it should be able to, at least in that context, no longer output the incorrect information yet it still does. It doesn't learn from it because it can't.

This is wrong. If you ask it something, it replies and you correct it, it will absolutely "learn" from it for this session. That's due to the architecture, but it refutes your point.

It doesn't know what words mean. It knows that when it sees the symbols representing "You got {thing} wrong" the most likely symbols to follow represent "You are right I apologize".

So why can it often output correct information after it has been corrected? This should be impossible according to you.

That's all LLMs like GPT do currently. They analyze a collection of symbols (not actual text) and then output what they determine to be most likely to follow. That causes very interesting behavior, you can talk to it and it will respond as if you are having a conversation.

Aaah, the old "stochastic parrot" argument. Can you clearly show that humans don't analyse inputs and then output what they determine to be most likely to follow?

If you'd like, we can move away from the purely philosophical questions and go to a simple practical one: given some system (LLMs, animals, humans) how do I figure out whether the system understands? Can you give me concrete steps I can take to figure out if it's "true understanding" or "LLM level understanding"? Your earlier approach (tell it when it's incorrect) was wrong. Do you have an alternative? If not, how is this not a "god of the gaps" argument?

So why can it often output correct information after it has been corrected? This should be impossible according to you.

It generally doesn't. It apologizes then will output exactly, very nearly the same thing as before, or something else that's wrong in a brand new way. Have you used GPT before? This is a common problem, it's part of why you cannot trust anything it outputs unless you already know enough about the topic to determine it's accuracy.

No, LLMs understand a tree to be a complex relationship of many, many individual numbers. Can you clearly define how our understanding is based on something different?

And did you really just go "nuh huh its actually in binary"? I used the collection of symbols explanation as that's how OpenAI describes it so I thought it was a safe to just skip all the detail. Since it's apparently needed and you're unlikely to listen to me there's a good explanation in video form created by Kyle Hill. I'm sure many other people have gone and explained it much better than I can so instead of trying to prove me wrong which we can keep doing all day go learn about them. LLMs are super interesting and yet ultimately extremely primative.

It generally doesn't. It apologizes then will output exactly, very nearly the same thing as before, or something else that's wrong in a brand new way. Have you used GPT before? This is a common problem, it's part of why you cannot trust anything it outputs unless you already know enough about the topic to determine it's accuracy.

Hallucinations are different from in-context learning. I've seen a number of impressive examples of this, enough that you should provide evidence that it generally doesn't work. There are a bunch of papers on this topic, surely at least one would support your thesis?

And did you really just go "nuh huh its actually in binary"?

No, that is literally how knowledge is stored inside of neural networks. Plenty of papers have shown that the learning process is actually mostly about compression, since you distill the patterns of training data into smaller size data. This means that LLMs actually have concepts of things (which again has been shown independently, e.g. with Otello). These concepts are themselves stored as relationships between large amounts of numbers - that's how NNs work.

I also fully understand how the tokenization process works and what the mentioned "symbols" are. Please explain what this has to do with anything. The model sees text in specific chunks as an optimisation, what does this change?

I'm a big boy who has already implemented his own LLMs from the group up, so feel free to skip any simplifications and tell me exactly, in detail, what you mean.

I think just about every developer is either considering, using now or has a wary eye open on this tech as it really is going to bring game worlds to life as it improves.

As long as it's not like that nvidia demo https://youtu.be/5R8xZb6J3r0 😅

But I'm sure actual game studios can do better than that.

That felt pretty flat for sure. In the months since then AI voices have gotten a lot more expressive and we have learned a lot on how to creat a more real feeling character. Still not fully there yet though. I wonder what AAA big name game will really pull it off first and set the tone for others to follow?

It's not just the voice, the script is also the most generic NPC robot sounding shit ever.

But yeah, this doesn't seem like it would be that difficult to fix.

This is the best summary I could come up with:


The multiyear partnership will include an “AI design copilot” system that Xbox developers can use to create detailed scripts, dialogue trees, quest lines, and more.

“This partnership will bring together: Inworld’s expertise in working with generative AI models for character development, Microsoft’s cutting-edge cloud-based AI solutions including Azure OpenAI Service, Microsoft Research’s technical insights into the future of play, and Team Xbox’s strengths in revolutionizing accessible and responsible creator tools for all developers.”

Inworld has been working on AI NPCs that react to questions from a player, much like how ChatGPT or Bing Chat responds to natural language queries.

These AI NPCs can respond in unique voices and can include complex dialogue trees or personalized dynamic storylines within a game.

The Finals developer Embark Studios recently had to defend against its use of AI-generated voices, arguing that “making games without actors isn’t an end goal,” in a statement to IGN.

“We want to help make it easier for developers to realize their visions, try new things, push the boundaries of gaming today and experiment to improve gameplay, player connection and more,” says Zhang.


The original article contains 484 words, the summary contains 183 words. Saved 62%. I'm a bot and I'm open source!