OpenAI Sora Created Image Of Two Women In A Video

OpenAI’s Sora Created Image Of Two Women In A Video

by , Staff Writer @lauriesullivan, March 14, 2024

OpenAI Sora Created Image Of Two Women In A Video

The image of two women sitting next to each other in a room appearing in a media story is not real. The women were created by Sora, OpenAI’s text to video artificial intelligence (AI) imaging technology.

The image appeared this week in a Wall Street Journal article. 

Google, Microsoft and other companies are scrambling to integrate generative artificial intelligence (GAI) — text and video — into their products.

In a recent post on X, Mikhail Parakhin, CEO of advertising and web services at Microsoft, hinted that OpenAI’s text-to-video model Sora will eventually be integrated into Copilot, but said the process will take time.

Sora is a large language model chatbot developed by OpenAI. If a person uses text word prompts in detail, the AI model will turn the words into a 60-second highly detailed video. 

OpenAI CTO Mira Murati runs all the technology at the company, including Sora. She revealed to The Wall Street Journal that OpenAI will release the text-to-video AI platform this year, perhaps in a few months.

Based on a text prompt, Sora will create hyper-realistic and detailed video images of up to 60 seconds.

She describes it as a type of generative AI (GAI) model that creates more distilled images starting from random noise.

The model has analyzed many videos and learned how to identify objects and their actions. When given a text prompt, it creates a scene by defining a timeline and adding details for each frame.

There are glitches in the technology. It is seen in a prompt given to Murati from the WSJ journalist. She asked for the technology to create a picture of a “female video producer on a sidewalk in New York City holding a high-end cinema camera. Suddenly a robot yanks the camera out of her hand.”

In the video the model doesn’t follow the prompt, because the woman morphs into the robot rather than the robot coming up to the woman and grabbing the camera. And in the background on the street, the cars change color as they travel down the street.

In another example, in the video of a bull in a china shop, the image of the bull stomps on the china, but it doesn’t break.

This is a tool that is intended extend people’s creativity, and OpenAI wants people and creators everywhere to be a part of informing people about how the technology is developed and deployed, she added.

One of the more difficult things to simulate, Murati said, is the motion of hands. In the clip of the AI-generated women, their mouths move but there is no sound — something that OpenAI must create. The licensed data to train the models does include data from Shutterstock, but the company declined to elaborate further.

The amount of time it takes to create a 60-second video varies based on the complexity of the images. It also takes a lot of computing power. Today, it is more expensive to produce Sora’s video clips than images from Dall-E, the company’s image generator.

The image of two women in a room talking is not real. OpenAI CTO Mira Murati explains the rollout plans for the text-to-video AI platform.
 
 

(11)