The Dawn of Photorealistic Video Creation

Hartmann Capital Weekly, Friday February 23rd, 2024

and

Feb 24, 2024

If you would like exposure to the biggest opportunities in Crypto, VR, and AI, speak with us this week about getting an allocation in one of our industry-leading funds. Book a call today, or meet us next at GDC in San Francisco for a 1-1 meeting to learn more about our funds.

The Dawn of Photorealistic Video Creation

Written by AI & Metaverse Associate Daniel Derzic

In December 2023, we published an article called "Is This the Breakout Moment for AI Videos?" showcasing the rising tide of AI video tools and how artists could use them. It was evident that we were nearing the tipping point for AI films. Initially, these models were limited, with only a few seconds of movement. Then, in January, Google introduced Lumiere. It can generate up to five-second clips and works with various content creation and editing features, including image-to-video, video in-painting, and stylized generation.

Last week, the company introduced Gemini 1.5, with a context window of up to 1 million tokens and the ability to use video as an input, an enormous upgrade.

In response, OpenAI pitched SORA, a ground-breaking video-making tool named after the Japanese word for sky, which suggests 'limitless creative possibilities,' according to the engineering team.

A New Era

While Lumiere is limited to 5 seconds and 512 × 512 pixels, the SORA text-to-video model can produce up to 60 seconds of highly realistic films with a 1920 × 1080 pixels resolution. OpenAI also provided extensive research papers and demo movies demonstrating SORA's capabilities, such as generating videos from image prompts and smoothly merging videos. Here are some examples:

As of the announcement, access to SORA is closed to a select group of creators as part of a red-teaming process. This phase attempts to carefully test and develop the model before making it accessible to everyone.

How does it work?

At its core is the Transformer architecture, created by Google in 2017 and dramatically improved AI's ability to read and generate text. OpenAI has successfully built on this base. SORA's training uses copyright-free data. It begins with each video frame containing static noise and utilizes machine learning to progressively change the pictures into something matching the description in the prompt.

SORA is innovative in that it evaluates many video frames simultaneously, which addresses the challenge of keeping things constant while they move in and out of view. This has the potential to disrupt the entertainment and media industries, not to mention advertising, social media, and e-commerce.

Potential Applications

Video production, traditionally a pricey and slow process involving real-world filming or special effects, could see a revolution with Sora if it's affordably priced. OpenAI's paper, "Video Generation Models as World Simulators.” suggests that advanced versions could show a promising path toward building general-purpose simulators of the physical world. It could be used for scientific simulations, like assessing tsunami impacts on infrastructure and human well-being. Beyond that, there are several other potential use cases:

Prototyping and concept visualization

Even if such AI videos aren’t part of the end product, they can be useful for quickly showcasing concepts. Filmmakers could use it to create scene mockups before shooting, and designers can develop product videos.

Virtual Environments and 3D Modeling

3D Gaussian Splatting is a computer graphics and visualization technique that converts point cloud data into a continuous volume or surface. When combined with Sora, the user can convert photorealistic videos into detailed 3D models, paving the way for more advanced VR and AR apps. The approach involves making a video with Sora and reverse-engineering it using 3D Gaussian Splatting into a'splat'—a specialized 3D file. The video below shows a 3D Gaussian Splat created from the Sora footage on the left.

Stock Footage and Asset Creation

Traditional licensing and limitations on the availability of materials have long been an obstacle for video editors and 3D artists. SORA's sophisticated AI-powered models could get around these challenges, allowing artists to generate unique video material tailored to their requirements. SORA, which could build on tools like Adobe's Firefly AI, might enable real-time video content modifications. This includes producing advertisements, promotional films, and expensive product demos.

Limitations

However, the present version of SORA has significant limitations. Not only can the spatial positioning of objects shift wildly, but Sora's lack of awareness of physics can result in an indifference to real-world physical rules. The model fails to understand cause and effect. For example, in the video below of an explosion on a basketball hoop, the net appears to be repaired after the hoop explodes.

The Takeaway

OpenAI decided not to open up Sora to the public, mentioning the need for more safety testing on the technology. This concern is appropriate, especially given the ethical considerations surrounding the creation of lifelike films in an election year. OpenAI is also developing an AI video detection classifier for identifying videos generated by Sora, similar to the method used with ChatGPT, which resulted in deploying a text classifier. However, this classifier was eventually switched off due to its unreliability.

Despite its potential, Sora and similar technologies face a significant hurdle: making precise, minor edits without regenerating entire scenes. This limitation is far from minor; it directly affects customer satisfaction and the practical application of AI in professional content creation. Innovations such as Runway ML's multi-motion brush and ByteDance's Boximator are promising, suggesting a future where detailed adjustments could be made more easily and seamlessly.

Sora unlocks immense potential for creativity and storytelling, yet it also challenges our conventional views on authenticity and trust in the digital age.

Recent Coverage:

Battlefin 2024: Catalysts driving the Crypto Markets

Bitkraft Summit 2023: VR, AR and MR

Disclaimers:

This is not an offering. This is not financial advice. Always do your own research.

Our discussion may include predictions, estimates or other information that might be considered forward-looking. While these forward-looking statements represent our current judgment on what the future holds, they are subject to risks and uncertainties that could cause actual results to differ materially. You are cautioned not to place undue reliance on these forward-looking statements, which reflect our opinions only as of the date of this presentation. Please keep in mind that we are not obligating ourselves to revise or publicly release the results of any revision to these forward-looking statements in light of new information or future events.

The Dawn of Photorealistic Video Creation

Hartmann Capital Weekly, Friday February 23rd, 2024

The Dawn of Photorealistic Video Creation

A New Era

How does it work?

Potential Applications

Prototyping and concept visualization

Virtual Environments and 3D Modeling

Stock Footage and Asset Creation

Limitations

The Takeaway

Recent Coverage:

Discussion about this post