Vision Agents

What You Can Build
Capabilities
Next Steps

Build low-latency voice and video AI agents using any model. Vision Agents is an open-source, edge-agnostic Python framework with 25+ integrations, production-ready deployment, and Stream’s global edge network for sub-500ms latency.

Quickstart

Build your first agent in under 5 minutes

GitHub

Star the project and explore examples

What You Can Build

AI Golf Coach

YOLO pose detection watches your swing via camera while Gemini gives real-time coaching feedback.

Phone Support Agent

Twilio-powered agent answers inbound calls with RAG-backed knowledge bases via TurboPuffer.

Smart Security Camera

Face recognition and package detection with YOLO, sending automated alerts in real time.

Live Sports Commentator

Roboflow object detection tracks players and ball while an LLM delivers play-by-play.

Live Video Restyler

Camera feed transformed into narrated stories with Decart video style transfer.

Interactive Avatar

HeyGen avatars that see, hear, and respond with real-time voice and video.

Capabilities

25+ integrations — OpenAI, Gemini, Anthropic, Deepgram, ElevenLabs, YOLO, and more
Two modes — Realtime APIs (WebRTC/WebSocket) or custom STT → LLM → TTS pipelines
Video processing — Run YOLO, Roboflow, or custom models on every frame
Phone support — Twilio integration for voice calls with bi-directional audio
RAG — TurboPuffer vector search and Gemini FileSearch for knowledge retrieval
Production ready — HTTP server, Prometheus metrics, Docker and Kubernetes deployment

Next Steps

Quickstart

Install and build your first agent

Integrations

Browse 25+ supported AI providers

Guides

Deploy to production with Docker and metrics

Try Stream Video

Get 333,000 free participant minutes

⌘I

Getting Started

AI Technologies

Core Architecture

Reference

Quickstart

GitHub

What You Can Build

AI Golf Coach

Phone Support Agent

Smart Security Camera

Live Sports Commentator

Live Video Restyler

Interactive Avatar

Capabilities

Next Steps

Quickstart

Integrations

Guides

Try Stream Video

Getting Started

AI Technologies

Core Architecture

Reference

Quickstart

GitHub

​What You Can Build

AI Golf Coach

Phone Support Agent

Smart Security Camera

Live Sports Commentator

Live Video Restyler

Interactive Avatar

​Capabilities

​Next Steps

Quickstart

Integrations

Guides

Try Stream Video

What You Can Build

Capabilities

Next Steps