Kling VIDEO 3.0 adds multi-shot, audio, and 15-second generation
Kling has released Kling VIDEO 3.0, an updated version of its generative AI video model. The update introduces several new capabilities, including multi-shot narrative generation with storyboard control, enhanced subject consistency via element referencing, native audio output with multilingual support for five languages, and an increased maximum video duration of 15 seconds. The announcement also details a credit-based pricing model with different rates for resolution, native audio, and voice control features.
Key Takeaways
- Multi-shot generation is new in VIDEO 3.0, with automatic shot planning and a Custom Multi-Shot mode for per-shot control.
- Element referencing can lock in characters, items, and scene details across camera moves, including start-frame plus element reference workflows.
- Native audio now supports Chinese, English, Japanese, Korean, and Spanish, along with dialects, accents, and multilingual code-switching.
- VIDEO 3.0 extends output duration to 15 seconds, with flexible lengths from 3 to 15 seconds.
- Pricing is per second: 1080p native audio costs 12 credits per second, 720p native audio costs 9, and voice control adds 2 credits per second.
Why It Matters
Kling is pushing its model beyond single-shot clips toward longer, more controlled video generation with audio and multi-character dialogue. That matters because the feature set now covers narrative structure, subject consistency, and localized speech in one workflow, which is the kind of toolset creative teams use when they need more than a short prompt-to-clip output. The pricing table also makes the tradeoffs explicit: native audio, resolution, and voice control all add cost. What to watch next is how Kling’s API and app users adopt the new Multi-Shot and Element Reference modes, since those are the clearest signals of whether VIDEO 3.0 is being used for production-style work or demos.
Read full article at kling.ai
