AI & VideoTechnical DevelopmentMay 18, 2026

DeltaToken cuts video tokens from 180K to under 1,000

Qiang Zhang announced 'DeltaToken', a new video tokenizer designed to reduce the number of VAE tokens for video models by up to 192x while maintaining the same number of channels. This advancement is stated to lower training costs, increase inference savings for real-time video generation, and extend video context length from seconds to minutes for AI models.

Key Takeaways

DeltaToken is a new video tokenizer for world models and video models that uses the same number of channels while cutting VAE tokens by up to 192x.
One example in the post shows token count falling from 180K to under 1,000.
The project claims 10–100x lower training cost, with a video foundation model trained from scratch for under $4,000 in compute.
The post says the compression could extend context length from 10–15 seconds to 5–10 minutes for native cross-shot consistency.
Qiang Zhang says the encoder focuses on what changes in video, which he says improves physical grounding for embodied world models.

Why It Matters

If the claims hold up, DeltaToken reduces the token burden that sits between raw video and model training, inference, and longer-context generation. That matters most for systems trying to run video generation in LLMs, VLMs, and VLAs, since the post argues the compression makes native integration possible without architectural compromise. The immediate technical signal is cost: sub-$4,000 scratch training and real-time on-device generation are both called out. Watch for the released demo details and whether the 180K-to-under-1,000 token reduction holds across different video workloads.

Read full article at linkedin.com

Broadcast: AMD pushes AI to the edge for live broadcast latency and trust

Spotify Engineering: Spotify: 99% of Engineers Use AI Coding Tools Weekly, Productivity Up 76%

Startuphub: Wasmer builds Node.js edge runtime in two weeks using OpenAI Codex

DeltaToken cuts video tokens from 180K to under 1,000

Key Takeaways

DeltaToken is a new video tokenizer for world models and video models that uses the same number of channels while cutting VAE tokens by up to 192x.
One example in the post shows token count falling from 180K to under 1,000.
The project claims 10–100x lower training cost, with a video foundation model trained from scratch for under $4,000 in compute.
The post says the compression could extend context length from 10–15 seconds to 5–10 minutes for native cross-shot consistency.
Qiang Zhang says the encoder focuses on what changes in video, which he says improves physical grounding for embodied world models.

Why It Matters

Read full article at linkedin.com

DeltaToken cuts video tokens from 180K to under 1,000

Key Takeaways

Why It Matters

Related Articles

DeltaToken cuts video tokens from 180K to under 1,000

Key Takeaways

Why It Matters

Related Articles

Newest

Upcoming Events

Top Sources

Newest

Upcoming Events

Top Sources

Related Articles

AMD pushes AI to the edge for live broadcast latency and trust

Spotify: 99% of Engineers Use AI Coding Tools Weekly, Productivity Up 76%

Wasmer builds Node.js edge runtime in two weeks using OpenAI Codex