The Google I/O 2024 keynote was a 112-minute affair in which the company made several big announcements focused on artificial intelligence (AI). The announcements ranged from new AI models to integrating AI into Google products, but perhaps one of the most interesting introductions was Veo, an AI video generation model that can generate 1080p videos. The tech giant said the AI ​​tool can generate videos that exceed the one-minute mark. Notably, OpenAI also introduced its video AI model called Sora in February.

During the event, Demis Hassabis, co-founder and CEO of Google DeepMind, revealed Wow. Announcing the AI ​​model, he said: “Today I am excited to announce our newest and most capable generative video model called Veo. Veo creates high quality 1080p videos from text, images and video prompts. It can capture the details of your instructions in a variety of visual and cinematic styles.”

The tech giant claims that Veo can carefully follow prompts to understand the nuance and tone of a phrase and then generate a video to mimic it. The AI ​​model can generate videos in different styles such as timelapse, close-ups, fast tracking shots, aerial shots, and various shots with lighting and depth of field. Besides generating video, the AI ​​model can also edit videos when the user provides it with an initial video and a prompt to add or remove something. Additionally, it can also generate videos over a minute long either through a single prompt or through multiple consecutive prompts.

To solve the consistency problem in video generation models, Veo uses latent diffusion transformers. This helps reduce the occurrence of unexpected flickering, skipping, or transitions between frames of characters, objects, or the entire scene. Google emphasized that videos created by Veo will be watermarked using SynthID, the company’s internal tool for watermarking and identifying AI-generated content. The model will soon be available to select creators through the VideoFX tool in Google Labs.

Veo’s similarities to OpenAI’s Sora

Although neither AI model is yet available to the public, both share several similarities. Veo can generate 1080p videos of duration that can exceed one minute, while OpenAI’s Sora can generate videos of up to 60 seconds. Both models can generate videos from text prompts, images, and videos. Based on diffusion models, both are able to generate videos from multiple frames, styles and cinematographic techniques. Both Sora and Veo also come with AI-generated content tags. Sora uses the Coalition for Content Provenance and Authenticity (C2PA) standard, while Veo uses its native SynthID.

Affiliate links may be automatically generated – see our ethics statement for details.