Skip to main content
Vision Agents is an open-source Video AI framework for building real-time voice and video applications. It ships with Stream Video as its default low-latency transport, powered by our global edge network. The framework is edge/transport agnostic meaning developers can also bring any edge layer they like.

What can you build?

Vision Agents makes it simple to prototype and scale a wide range of AI-powered video apps, including:
  • Coaching & Training — live sports coaching, guided workouts
  • Collaboration — meeting assistants, note-taking, transcription
  • Automation & Robotics — IoT control, surveillance, manufacturing workflows
  • Video AI — video avatars, character agents

Get Started

Built-in AI integrations

Out of the box, Vision Agents supports 23+ providers across the AI stack: Each integration is built on extensible base classes. With BaseProcessor or VideoProcessorMixin, you can plug in custom computer-vision models. See Create Your Own Plugin for details.

Explore the Documentation