OpenAI’s latest model can produce Sora videos – and they look good

[ad_1]

OpenAI, following in the footsteps of startups like route and tech giants like Google And metaEntering the video generation.

OpenAI unveiled today Sora, a generative AI model that creates video from text. OpenAI claims that given a brief – or detailed – description or a still image, Sora can generate a 1080p movie-like scene with multiple characters, a variety of motion and background detail.

Sora can also “expand” existing video clips – doing its best to fill in missing details.

“Sora has a deep understanding of language, enabling her to accurately interpret signs and generate compelling characters that convey vivid emotions,” OpenAI writes in a blog post. ,The model not only understands what the user asked for in the prompt, but also how those things exist in the physical world.

Now, there is a lot of commotion over OpenAI’s demo page for Sora – the above statement is an example. But cherry picked samples from the model to do At least compared to other text-to-video technologies we’ve seen, it looks pretty impressive.

For starters, Sora can produce videos in multiple styles (for example, photorealistic, animated, black and white) up to a minute long—far longer than most text-to-video models. And these videos maintain reasonable coherence in the sense that they don’t always succumb to what I like to call “AI weirdness”, such as objects moving in physically impossible directions.

Check out this tour of the art gallery, all created by Sora (from my video-GIF conversion tool graininess – ignore compression):

Image Credit: OpenAI

Or this animation of a flower blooming:

Image Credit: OpenAI

I will say that some of Sora’s videos with human subjects – for example, a robot standing in front of a city scene, or a person walking on a snowy path – have a video game-y quality to them, probably because there isn’t much going on. Is in the background. AI quirks also crop up in several clips, like cars moving in one direction then suddenly turning or weapons melting into the duvet cover.

Image Credit: OpenAI

OpenAI – for all its excellence – admits that the model is not perfect. It writes:

,[Sora] May have difficulty accurately simulating the physics of a complex scene, and may not understand specific examples of cause and effect. For example, someone may bite off a piece from a cookie, but later, there may be no bite mark on the cookie. The model may also confuse the spatial details of a prompt, for example, mixing up left and right, and may struggle with accurate descriptions of events occurring over time, such as following a specific camera trajectory.

OpenAI has ranked Sora as a research preview, revealing what data was used to train the model (less than ~10,000 hours of “high quality” video) and Sora. Generally avoided making available. The justification for this is the potential for abuse; OpenAI correctly points out that bad actors can abuse models like Sora in a myriad of ways.

OpenAI says it is working with experts to examine models of exploits and creation tools to determine whether a video was produced by Sora. The company also says that, if it chooses to build the model as a public-facing product, it will ensure that the generated output includes provenance metadata.

“We will engage policymakers, educators, and artists from around the world to understand their concerns and identify positive use cases for this new technology,” OpenAI writes. “Despite extensive research and testing, we cannot predict all the beneficial ways people will use our technology, nor all the ways people will misuse it. That’s why we believe learning from real-world usage is a critical component of building and releasing increasingly secure AI systems over time.

[ad_2]

Thanks For Reading

Leave a Comment Cancel Reply