OpenAI introduces Sora, its text-to-video AI model

catculation@lemmy.zip to

Technology@lemmy.world – 424 points – 4 months ago

OpenAI introduces Sora, its text-to-video AI model

theverge.com

https://openai.com/sora

Archive https://archive.is/V8Fv3

You are viewing a single comment

View all comments

This is still so bizarre to me. I've worked on 3D rendering engines trying to create realistic lighting and even the most advanced 3D games are pretty artificial. And now all of a sudden this stuff is just BAM super realistic. Not just that, but as a game designer you could create an entire game by writing text and some logic.

In my experience as a game designer, the code that LLMs spit out is pretty shit. It won't even compile half the time, and when it does, it won't do what you want without significant changes.

The correct usage of LLMs in coding imo is for a single use case at a time, building up to what you need from scratch. It requires skill both in talking to AI for it to give you what you want, knowing how to build up to it, reading the code it spits out so that you know when it goes south and the skill of actually knowing how to build the bigger picture software from little pieces but if you are an intermediate dev who is stuck on something it is a great help.

That or for rubber ducky debugging, it s also great in that

That sounds like more effort than just... writing the code.

It s situationally useful

Chatgpt once insisted my JSON was actually YAML

Technically it is, but I agree that is imprecise and nobody would say so IRL. Unless they are being a pedantic nerd, like I am right now.

3 more...

Keep in mind that this isn't creating 3d Billy volumes at all. While immensely impressive, the thing being created by this architecture is a series of 2d frames.

Because it's trained on videos of the real world, not on 3d renderings.

Lol you don't know how cruel that is. For decades programmers have devoted their passion to creating hyperrealistic games and 3D graphics in general, and now poof it's here like with a magic wand and people say "yeah well you should have made your 3D engine look like the real world, not to look like shit" :D

Welcome to the club my friend... Expert after expert is having this experience as AI develops in the past couple years and we discover that the job can be automated way more than we thought.

First it was the customer service chat agents. Then it was the writers. Then it was the programmers. Then it was the graphic design artists. Now it's the animators.

Another programmer here. The bottleneck in most jobs isn't in getting boilerplate out, which is where AI excels, it's in that first and/or last 10-20%, alongside dictating what patterns are suitable for your problem, what proprietary tooling you'll need to use, what API's you're hitting and what has changed in recent weeks/months.

What AI is achieving is impressive, but as someone that works in AI, I think that we're seeing a two-fold problem: we're seeing a limit of what these models can accomplish with their training data, and we're seeing employers hedge their bets on weaker output with AI over specialist workers.

The former is a great problem, because this tooling could be adjusted to make workers lives far easier/faster, in the same way that many tools have done so already. The latter is a huge problem, as in many skilled worker industries we've seen waves of layoffs, and years of enshitification resulting in poorer products.

The latter is also where I think we'll see a huge change in culture. IMO, we'll see existing companies bet it all and die from supporting AI over people, and a new wave of companies focus on putting output of a certain standard to take on larger companies.

This is a really balanced take, thank you

Writer here, absolutely not having this experience. Generative AI tools are bad at writing, but people generally have a pretty low bar for what they think is good enough.

These things are great if you care about tech demos and not quality of output. If you actually need the end result to be good though, you’re gonna be waiting a while.

If you actually need the end result to be good though, you’re gonna be waiting a while.

I agree with everything you said, but it seems in the context of AI development "a while" is like, a few years.

That remains to be seen. We have yet to see one of these things actually get good at anything, so we don’t know how hard that last part is to do. I don’t think we can assume there will be continuous linear progress. Maybe it’ll take one year, maybe it’ll take 10, maybe it’ll just never reach that point.

Yeah a real problem here is how you get an AI which doesn't understand what it is doing to create something complete and still coherent. These clips are cool and all, and so are the tiny essays put out by LLMs, but what you see is literally all you are getting; there are no thoughts, ideas or abstract concepts underlying any of it. There is no meaning or narrative to be found which connects one scene or paragraph to another. It's a puzzle laid out by an idiot following generic instructions.

That which created the woman walking down that street doesn't know what either of those things are, and so it can simply not use those concepts to create a coherent narrative. That job still falls onto the human instructing the AI, and nothing suggests that we are anywhere close to replacing that human glue.

Current AI can not conceptualise -- much less realise -- ideas, and so they can not be creative or create art by any sensible definition. That isn't to say that what is produced using AI can't be posed as, mistaken for, or used to make art. I'd like to see more of that last part and less of the former two, personally.

Current AI can not conceptualise – much less realise – ideas, and so they can not be creative or create art by any sensible definition.

I kinda 100% agree with you on the art part since it can't understand what it's doing... On the other hand, I could swear that if you look at some generated AI imagines it's kind of mocking us. It's a reflection of our society in a weird mirror. Like a completely mad or autistic artist that is creating interesting imagery but has no clue what it means. Of course that exists only in my perception.

But it the sense of "inventive" or "imaginative" or "fertile" I find AI images absolutely creative. As such it's telling us something about the nature of creative process, about the "limits" of human creativity - which is in itself art.

When you sit there thinking up or refining prompts you're basically outsourcing the imaginative visualizing part of your brain. An "AI artist" might not be able draw well or even have the imagination, but he might have a purpose or meaning that he's trying to visualize with the help of AI. So AI generation is at least some portion of the artistic or creative process but not all of it.

Imagine we could have a brain computer interface that lets us perceive virtual reality like with some extra pair of eyes. It could scan our thoughts and allows us to "write text" with our brain, and then immediately feeds back a visual AI generated stream that we "see". You'd be a kind of creative superman. Seeing / imagining things in their head is of course what many people do their whole life but not in that quantity or breadth. You'd hear a joke and you would not just imagine it, you'd see it visualized in many different ways. Or you'd hear a tragedy and...

Like a completely mad or autistic artist that is creating interesting imagery but has no clue what it means.

Autists usually have no trouble understanding the world around them. Many are just unable to interface with it the way people normally do.

It’s a reflection of our society in a weird mirror.

Well yes, it's trained on human output. Cultural biases and shortcomings in our species will be reflected in what such an AI spits out.

When you sit there thinking up or refining prompts you’re basically outsourcing the imaginative visualizing part of your brain. [...] So AI generation is at least some portion of the artistic or creative process but not all of it.

We use a lot of devices in our daily lives, whether for creative purposes or practical. Every such device is an extension of ourselves; some supplement our intellectual shortcomings, others physical. That doesn't make the devices capable of doing any of the things we do. We just don't attribute actions or agency to our tools the way we do to living things. Current AI possess no more agency than a keyboard does, and since we don't consider our keyboards to be capable of authoring an essay, I don't think one can reasonably say that current AI is, either.

A keyboard doesn't understand the content of our essay, it's just there to translate physical action into digital signals representing keypresses; likewise, an LLM doesn't understand the content of our essay, it's just translating a small body of text into a statistically related (often larger) body of text. An LLM can't create a story any more than our keyboard can create characters on a screen.

Only once/if ever we observe AI behaviour indicative of agency can we start to use words like "creative" in describing its behaviour. For now (and I suspect for quite some time into the future), all we have is sophisticated statistical random content generators.

Still waiting on the programmer part. In a nutshell AI being say 90% perfect means you have 90% working code IE 10% broken code. Images and video (but not sound) is way easier cause human eyes kinda just suck. Couple of the videos they've released pass even at a pretty long glance. You only notice funny businesses once you look closer.

I can't imagine that digital artists/animators have reason to worry. At the upper end, animated movies will simply get flashier, eating up all the productivity gains. In live action, more effects will be pure CGI. At the bottom end, we may see productions hiring VFX artists, just as naturally as they hire makeup artists now.

When something becomes cheaper, people buy more of it, until their demand is satisfied. With food, we are well past that point. I don't think we are anywhere near that point with visual effects.

It seems to me that AI won't completely replace jobs (but will do in 10-20 years). But will reduce demand because oversaturation + ultraproductivity with AI. Moreover, AI will continue to improve. A work of a team of 30 people will be done with just 3 people.

Yeah. And it's not just how good the images look it's also the creativity. Everyone tries to downplay this but I've read texts and those videos and just from the prompts there is a "creative spark" there. It's not very bright spark lol but it's there.

I should get into this stuff but I feel old lol. I imagine you could generate interesting levels with obstacles and riddles and "story beats" too.

Because sometimes the generator just replicates bits of its training data wholesale. The "creative spark" isn't its own, it's from a human artist left uncredited and uncompensated.

Artists are "inspired" by existing art or things they see in real life all the time. So that they can replicate art doesn't mean they can't generate art. It's a non sequitur. But I'm sure people are going to keep insisting on this so lets not argue back and forth on this :D

3 more...