Pinned
🚀 New paper: Mamba–Transformer hybrid VLMs can go fast without forgetting.
We introduce stateful token reduction for long-video VLMs.
✅ Only 25% of visual tokens
🚀 3.8–4.2× faster prefilling (TTFT)
🎯 Near-baseline accuracy (can exceed baseline with light finetuning)













