AI & VideoProduct LaunchMay 25, 2026

Google’s Gemini Omni targets video creation from scripts, sketches

Google has unveiled Gemini Omni, a multimodal AI model designed for advanced video creation and editing. This new model processes various inputs including text prompts, structural scripts, images, hand-drawn sketches, and existing video clips to generate and refine video content.

Key Takeaways

Gemini Omni accepts text prompts and structural scripts as inputs for video generation and editing.
The model also uses images, hand-drawn sketches, and illustrations as input material.
Existing video clips can serve as style or structural references for Gemini Omni.
Google positions Gemini Omni specifically for advanced video creation and editing.

Why It Matters

Google is pushing video generation beyond text-only prompting by adding scripts, sketches, images, and reference clips into one multimodal model. For streaming and video workflows, that points to a more flexible production toolchain inside Google’s AI stack, especially for iterative edit work rather than only first-draft generation. The key signal to watch is whether Google shows Gemini Omni handling both style references and structural references across multiple media types in more detailed demonstrations or product documentation.

Read full article at adgully.com

Agora: Agora Integrates OpenAI Real-Time API for Low-Latency Conversational AI

wTVision: wTVision Debuts CricketStats CG, Enters Cricket Graphics Market in Bangladesh

Amazon Web Services, Inc.: AWS SageMaker Adds Multi-Turn RL for Specialized AI Model Training