Pull, Pace, Burst: The Server-Side Loop Behind “Instant Play”
This technical post describes implementing pull-based audio/video streaming loops using Swift structured concurrency and GStreamer appsinks, including real-time software pacing, startup buffering via a burst phase, and maintainability via a “state-struct” pattern. It details queue-pressure-aware startup boost logic, transport buffer sizing, PTS resolution/interpolation (including fallback to frame-count timelines for problematic audio timestamps), and startup gating rules such as keyframe requirements and audio/video alignment. The article also covers Annex B to AVCC conversion for Apple VideoToolbox compatibility, audio packet coalescing to reduce per-packet overhead, and detection of realtime transcoding failure based on achieved pacing ratio.
Key Takeaways
- Pull-based appsink loops (vs push callbacks) give the server explicit control over when frames leave—critical for predictable pacing and network utilization.
- Startup buffering is treated as a first-class phase: a no-sleep burst floods the client buffer, but ends early when AsyncStream queue pressure hits a high-water mark (~80%).
- Robust timestamping is a survival skill: the design interpolates missing PTS and can switch audio to a synthetic frame-count timeline when demuxer timestamps drift or jump backward.
- Startup gates prevent “first-frame failure”: drop video until a keyframe arrives, and hold/drop audio so it doesn’t run ahead of video during decoder initialization.
- Interop and overhead matter: on-the-fly Annex B→AVCC conversion for Apple VideoToolbox, plus audio packet coalescing (e.g., 200ms batches) to cut WebSocket/protobuf per-packet costs.
Why It Matters
In low-latency streaming and interactive playback, pacing isn’t an implementation detail—it’s the product. This design reframes the server as an explicit timing authority: burst to reduce time-to-first-frame, then “re-anchor” to avoid the classic post-burst stall that looks like buffering to users. The queue-pressure-aware boost and realtime-transcode failure detection (pacing ratio <0.70) also translate engineering signals into operational guardrails: when to back off, when to drop, and when to fail fast. Expect more stacks to adopt “pull + structured concurrency” as the control plane for quality-of-experience.
Read full article at acgao.com