Vision Agents Documentation

Vision Agents is an open-source Video AI framework for building real-time voice and video applications. It ships with Stream Video as its default low-latency transport, powered by our global edge network. The framework is edge/transport agnostic meaning developers can also bring any edge layer they like.

What can you build?

Vision Agents makes it simple to prototype and scale a wide range of AI-powered video apps, including:

Coaching & Training — live sports coaching, guided workouts
Collaboration — meeting assistants, note-taking, transcription
Automation & Robotics — IoT control, surveillance, manufacturing workflows
Video AI — video avatars, character agents

Get Started

Installation

Install Vision Agents and set up your first project

Voice Agents

Build real-time voice agents with AI

Video Agents

Create AI-powered video applications

Integrations

Connect with popular AI providers

Built-in AI integrations

Out of the box, Vision Agents supports 23+ providers across the AI stack:

LLMs: OpenAI, Gemini, xAI, OpenRouter (Anthropic, GPT, Gemini & more)
Realtime APIs: OpenAI (WebRTC), Gemini, AWS Bedrock, Qwen
Speech-to-Text: Deepgram, Fast-Whisper, Wizper, Fish Audio
Text-to-Speech: ElevenLabs, Cartesia, AWS Polly, Inworld, Kokoro
Turn Detection: Smart Turn, Vogent
Video Processing: Ultralytics (YOLO), Moondream, Roboflow, Decart, HeyGen
Memory & Context: In-memory, Stream Chat

Each integration is built on extensible base classes. With BaseProcessor or VideoProcessorMixin, you can plug in custom computer-vision models. See Create Your Own Plugin for details.

Explore the Documentation

AI Technologies

Learn about TTS, STT, VAD, and more

Core Architecture

Understand the framework architecture

Guides

Step-by-step implementation guides

Cookbook

Ready-to-use examples and recipes

Getting Started

AI Technologies

Core Architecture

Cookbook

Reference

Vision Agents Documentation

What can you build?

Get Started

Installation

Voice Agents

Video Agents

Integrations

Built-in AI integrations

Explore the Documentation

AI Technologies

Core Architecture

Guides

Cookbook

Getting Started

AI Technologies

Core Architecture

Cookbook

Reference

​What can you build?

​Get Started

Installation

Voice Agents

Video Agents

Integrations

​Built-in AI integrations

​Explore the Documentation

AI Technologies

Core Architecture

Guides

Cookbook

What can you build?

Get Started

Built-in AI integrations

Explore the Documentation