Netflix open-sources VOID: physics-aware object removal for video
INSAIT Institute and Netflix announced VOID, an AI video model designed to remove objects from video while realistically reconstructing the scene’s subsequent dynamics (e.g., how objects would move if a person were removed). The model is built on CogVideoX and uses a “quadmask” method plus simulated training data generated in Blender, and it is released as open source with code, paper, and demos publicly available.
Key Takeaways
- VOID targets a harder problem than classic inpainting: it models post-removal scene dynamics (interaction + motion), not just pixel fill.
- Technique stack: CogVideoX backbone, “quadmask” to separate object/interaction zones/background, and simulated Blender data to cover scarce real examples.
- Netflix released the model openly (GitHub + Hugging Face demo + arXiv), inviting external iteration and potential standard-setting.
- Practical implications span post-production, localization tweaks, and cleanup workflows where continuity and physics realism matter.
Why It Matters
This is a quiet shift from “beauty fixes” to “counterfactual editing”: tools that can plausibly rewrite what would have happened if something wasn’t there. For streaming players, that points to faster, cheaper post pipelines (fewer reshoots, cleaner plates, more automation) and new flexibility in versioning—think last-minute prop removals, brand/legal edits, or safer edits without breaking continuity. The open-source release also matters strategically: Netflix can seed an ecosystem, attract research talent, and influence how physics-aware video models get evaluated and integrated into pro tools.
Read full article at insait.ai