OpenAI Sora – Indul Hassan

On February 15, 2024, OpenAI introduced Sora by sharing several impressive AI-generated videos and a research paper on X.

While not the first AI video model, Sora stood out for its exceptional consistency, duration, and photorealism.

To date, only OpenAI staff have shared videos generated by Sora on X or TikTok, some using fan-suggested prompts.

No release date has been announced, nor details on output limitations before potential integration into a tool like ChatGPT.

WHAT IS OPENAI SORA?

Sora is a generative video model similar to Runway’s Gen-2, Pike Labs Pika 1.0, and StabilityAI’s Stable Video Diffusion, creating AI video content from text, images, or video.

Named after the Japanese word for “sky” to signify its limitless creative potential, one of the first videos showcased two people walking through snowy Tokyo.

Sora can generate clips up to one minute long with consistent characters and motion, surpassing earlier models.

WHAT IS THE TECHNOLOGY BEHIND SORA?

Sora’s technology adapts from DALL-E 3, OpenAI’s generative image platform, with added features for fine-tuned control.

As a diffusion transformer model, Sora combines Stable Diffusion’s image generation with ChatGPT’s token-based generators. Videos are created in latent space, “denoised” in 3D patches, and processed through a video decompressor for standard viewing.

WHAT DATA WAS SORA TRAINED ON?

OpenAI trained Sora using publicly available videos, public domain content, and licensed copyrighted videos. The exact number of videos used remains undisclosed but is believed to be in the millions.

A video-to-text engine created captions and labels from ingested videos to fine-tune Sora on real-world content. Speculations suggest synthetic video content, like those from Unreal Engine 5, was also used to understand video physics.

WHY DID SORA SURPRISE ITS DEVELOPERS?

AI models often exhibit unexpected behavior. During post-training, Tom Brooks, a Sora researcher, noted it created 3D graphics from its dataset without additional training. Bill Peebles observed Sora generating different video angles unprompted, assuming it was necessary.

WHAT ABOUT CONTENT RESTRICTIONS AND PRIVACY?

Red teamers and safety experts worked during training to label and prohibit misinformation, hateful content, and bias. Generated videos include metadata tags indicating AI creation, and text classifiers ensure prompts adhere to usage policies.

Similar to DALL-E 3, Sora will have content restrictions, including a ban on generating images of real people, extreme violence, sexual content, hateful imagery, celebrity likeness, and intellectual property like logos.

HOW CAN I ACCESS SORA?

Currently, Sora is not publicly accessible. Only videos shared by OpenAI provide insight into the model, as they work on safety measures to distinguish AI-generated from real videos.

Tim Brooks, Sora’s research lead, emphasized the focus on safety and public confidence before release. Creating a video clip with Sora takes a considerable amount of time, further delaying access.

Sora is expected to be integrated into ChatGPT like DALL-E 3 rather than as a standalone product. An API will eventually allow third-party developers to incorporate Sora’s functionality into their own products, similar to DALL-E 3.

WHEN WILL SORA BE RELEASED?

Although no specific release date has been set, OpenAI’s CTA Mira Murati indicated Sora would be available in 2024, possibly before summer. It will be priced similarly to DALL-E, likely as part of ChatGPT’s premium version.

Generative AI