LiveKit Agents Overview

Relevant source files

Purpose and Scope

This document provides an overview of the LiveKit Agents framework, explaining its purpose, core architecture, and how the major components work together to enable real-time voice AI applications. For detailed information on specific subsystems, see:

System architecture details: System Architecture
Building your first agent: Getting Started
Core concepts and terminology: Core Concepts
Voice agent framework details: Voice Agent Framework
Worker and job management: Worker and Job Management

What is LiveKit Agents?

LiveKit Agents is a Python framework for building real-time voice AI agents that connect to LiveKit rooms via WebRTC. It provides:

Worker Management: Spin up agent processes that connect to LiveKit server and handle job assignments
Voice Pipeline: Orchestrate STT (Speech-to-Text), LLM (Language Model), and TTS (Text-to-Speech) in a streaming pipeline
Plugin Ecosystem: Provider-agnostic interfaces for AI services (OpenAI, Google, Anthropic, AWS, etc.)
Job Lifecycle: Manage agent lifecycle from room connection through conversation to shutdown
Tool Execution: Enable function calling and multi-step reasoning
Development Tools: Hot reload, console mode, telemetry, and debugging utilities

The framework handles the complex orchestration of real-time audio streaming, turn detection, interruption handling, and conversation state management, allowing developers to focus on agent behavior and business logic.

Sources: livekit-agents/livekit/agents/__init__.py15-19 livekit-agents/livekit/agents/worker.py1-257

Core Architecture Components

The following diagram shows the main classes and their relationships in the LiveKit Agents framework:

Architecture Overview: The framework follows a three-tier architecture. The Application Layer contains user code (entrypoint functions and Agent subclasses). The Core Classes provide orchestration and voice pipeline management. The Plugin System implements provider-specific integrations. External services handle WebRTC connectivity and AI inference.

Sources: livekit-agents/livekit/agents/worker.py256-472 livekit-agents/livekit/agents/job.py131-184 livekit-agents/livekit/agents/voice/agent_session.py136-367 livekit-agents/livekit/agents/voice/agent_activity.py105-166

Component Responsibilities

AgentServer

The AgentServer class is the main worker process that:

Connects to LiveKit server via WebSocket
Receives job assignments when agents are needed
Manages a pool of idle processes for quick job startup
Monitors worker load and capacity
Exposes HTTP health check endpoints
Handles graceful shutdown

Sources: livekit-agents/livekit/agents/worker.py256-354

JobContext

The JobContext provides the execution environment for each job:

Room connection (rtc.Room object)
LiveKit API client
Worker metadata (job ID, room name, etc.)
Shutdown callback registration
Participant management helpers
Session directory for temporary files

Sources: livekit-agents/livekit/agents/job.py131-246

AgentSession

The AgentSession is the voice agent runtime that:

Manages the complete agent lifecycle (start, pause, resume, close)
Orchestrates I/O streams (audio input/output, video input, transcription output)
Handles agent transitions and handoffs
Manages conversation state and history
Emits events (state changes, transcriptions, errors)
Tracks user/agent states (listening, speaking, thinking)

Sources: livekit-agents/livekit/agents/voice/agent_session.py136-367

Agent

The Agent class defines agent behavior:

Instructions (system prompt)
Available tools (function definitions)
Model configuration (STT, LLM, TTS, VAD)
Pipeline node implementations (customizable processing stages)
Lifecycle callbacks (on_enter, on_exit, on_user_turn_completed)

Sources: livekit-agents/livekit/agents/voice/agent.py34-89

Plugin System

Plugins implement provider-specific integrations:

Abstract base classes (llm.LLM, stt.STT, tts.TTS, llm.RealtimeModel)
Streaming abstractions (LLMStream, RecognizeStream, SynthesizeStream)
Provider implementations (OpenAI, Google, Anthropic, AWS, Deepgram, etc.)
Connection management and error handling

Sources: livekit-agents/livekit/agents/__init__.py23

Request Flow Example

The following diagram illustrates how a typical voice interaction flows through the system:

Request Flow: Audio flows from user → room → AgentSession → AgentActivity, where it's transcribed and turn detection occurs. After end-of-turn, the agent generates a reply via LLM, executes any tool calls in parallel, and synthesizes speech via TTS. Audio output flows back through AgentSession → room → user.

Sources: livekit-agents/livekit/agents/voice/agent_activity.py778-803 livekit-agents/livekit/agents/voice/agent_activity.py856-958 livekit-agents/livekit/agents/voice/generation.py57-180

Minimal Example

Here's a minimal working example showing the main API surface:

Class/Function Mapping:

Concept	Code Entity	Location
Worker	`AgentServer`	livekit-agents/livekit/agents/worker.py256
Entrypoint decorator	`@server.rtc_session()`	livekit-agents/livekit/agents/worker.py430-471
Job execution context	`JobContext`	livekit-agents/livekit/agents/job.py131
Room connection	`ctx.connect()`	livekit-agents/livekit/agents/job.py415-445
Room object	`ctx.room`	livekit-agents/livekit/agents/job.py331-337
CLI runner	`cli.run_app()`	livekit-agents/livekit/agents/cli/cli.py

Sources: examples/minimal_worker.py1-23 livekit-agents/livekit/agents/worker.py430-471 livekit-agents/livekit/agents/job.py131-445

Voice Agent Example

Here's a complete voice agent example showing the key components:

Voice Agent Flow: An Agent subclass defines behavior (instructions, tools, models). AgentSession manages the runtime, orchestrating the pipeline: audio input → STT → turn detection → LLM → tool execution → TTS → audio output.

Sources: livekit-agents/livekit/agents/voice/agent.py34-201 livekit-agents/livekit/agents/voice/agent_session.py471-693

Two Operating Modes

LiveKit Agents supports two fundamentally different operating modes:

Pipeline Mode

Uses separate components for each stage:

STT: Converts speech to text
VAD: Detects speech boundaries (optional)
LLM: Generates text responses with function calling
TTS: Synthesizes speech from text

Characteristics:

Maximum flexibility and provider choice
Explicit control over each pipeline stage
Turn detection via VAD, STT, or manual control
Supports any combination of providers

Code: The pipeline is orchestrated by AgentActivity which manages the flow through stt_node(), llm_node(), and tts_node().

Sources: livekit-agents/livekit/agents/voice/agent_activity.py105-237 livekit-agents/livekit/agents/voice/generation.py57-297

Realtime Mode

Uses a single multimodal model that handles both audio input and output:

RealtimeModel: Single model for speech-to-speech (e.g., OpenAI Realtime API, Google Gemini Live)

Characteristics:

Lower latency (no STT/TTS overhead)
Server-side turn detection
Native audio understanding
More limited provider options

Code: When Agent.llm is a RealtimeModel, AgentActivity uses RealtimeSession instead of the pipeline.

Sources: livekit-agents/livekit/agents/voice/agent_activity.py553-593 livekit-agents/livekit/agents/llm/realtime.py

Plugin Architecture

The plugin system decouples the framework from specific AI providers:

Plugin Pattern: Abstract base classes define interfaces that return streaming objects. Provider plugins implement these interfaces using provider-specific protocols. The framework operates against the abstract interfaces, making providers interchangeable.

Sources: livekit-agents/livekit/agents/llm/__init__.py livekit-agents/livekit/agents/stt/__init__.py livekit-agents/livekit/agents/tts/__init__.py

Job Lifecycle

The complete lifecycle of an agent job:

Phase	Component	Key Methods/Events
Assignment	`AgentServer`	Receives job from LiveKit server
Process Launch	`JobProcess`	Spawns new process or uses idle process
Initialization	`JobContext`	`entrypoint_fnc()` called with `JobContext`
Room Connection	`JobContext`	`ctx.connect()` joins LiveKit room
Agent Start	`AgentSession`	`session.start(agent)` initializes voice pipeline
Agent Enter	`Agent`	`agent.on_enter()` callback
Conversation	`AgentActivity`	Audio I/O, turn detection, generation
Agent Exit	`Agent`	`agent.on_exit()` callback
Shutdown	`JobContext`	Shutdown callbacks, cleanup
Process Cleanup	`JobProcess`	Process terminated or returned to pool

Sources: livekit-agents/livekit/agents/worker.py480-755 livekit-agents/livekit/agents/job.py131-304 livekit-agents/livekit/agents/voice/agent_session.py471-693

Development and Deployment

Development Mode

The framework provides several development tools:

Console Mode: Run agents locally with microphone/speaker I/O via lk-agents console
Hot Reload: Automatically reload code changes without restarting jobs via --reload
Session Recording: Capture audio and conversation history to files
Local Testing: Test agents without connecting to LiveKit server

Sources: livekit-agents/livekit/agents/cli/cli.py252-453 livekit-agents/livekit/agents/cli/watcher.py1-50

Production Deployment

For production:

Process Pool: Pre-warm processes for faster job startup
Load Management: Automatic capacity monitoring and job assignment
Telemetry: OpenTelemetry traces and Prometheus metrics
Cloud Integration: Session reports uploaded to LiveKit Cloud
Health Checks: HTTP endpoints for health monitoring

Sources: livekit-agents/livekit/agents/worker.py560-693 livekit-agents/livekit/agents/telemetry/traces.py117-238

Next Steps

For more detailed information:

Architecture deep-dive: See System Architecture for detailed component interactions
Building agents: See Getting Started for step-by-step tutorials
Core concepts: See Core Concepts for terminology and design patterns
Voice pipeline: See Voice Agent Framework for conversation management
Worker configuration: See Worker and Job Management for deployment options

LiveKit Agents Overview

Relevant source files

Purpose and Scope

System architecture details: System Architecture
Building your first agent: Getting Started
Core concepts and terminology: Core Concepts
Voice agent framework details: Voice Agent Framework
Worker and job management: Worker and Job Management

What is LiveKit Agents?

LiveKit Agents is a Python framework for building real-time voice AI agents that connect to LiveKit rooms via WebRTC. It provides:

Worker Management: Spin up agent processes that connect to LiveKit server and handle job assignments
Voice Pipeline: Orchestrate STT (Speech-to-Text), LLM (Language Model), and TTS (Text-to-Speech) in a streaming pipeline
Plugin Ecosystem: Provider-agnostic interfaces for AI services (OpenAI, Google, Anthropic, AWS, etc.)
Job Lifecycle: Manage agent lifecycle from room connection through conversation to shutdown
Tool Execution: Enable function calling and multi-step reasoning
Development Tools: Hot reload, console mode, telemetry, and debugging utilities

Sources: livekit-agents/livekit/agents/__init__.py15-19 livekit-agents/livekit/agents/worker.py1-257

Core Architecture Components

The following diagram shows the main classes and their relationships in the LiveKit Agents framework:

Component Responsibilities

AgentServer

The AgentServer class is the main worker process that:

Connects to LiveKit server via WebSocket
Receives job assignments when agents are needed
Manages a pool of idle processes for quick job startup
Monitors worker load and capacity
Exposes HTTP health check endpoints
Handles graceful shutdown

Sources: livekit-agents/livekit/agents/worker.py256-354

JobContext

The JobContext provides the execution environment for each job:

Room connection (rtc.Room object)
LiveKit API client
Worker metadata (job ID, room name, etc.)
Shutdown callback registration
Participant management helpers
Session directory for temporary files

Sources: livekit-agents/livekit/agents/job.py131-246

AgentSession

The AgentSession is the voice agent runtime that:

Manages the complete agent lifecycle (start, pause, resume, close)
Orchestrates I/O streams (audio input/output, video input, transcription output)
Handles agent transitions and handoffs
Manages conversation state and history
Emits events (state changes, transcriptions, errors)
Tracks user/agent states (listening, speaking, thinking)

Sources: livekit-agents/livekit/agents/voice/agent_session.py136-367

Agent

The Agent class defines agent behavior:

Instructions (system prompt)
Available tools (function definitions)
Model configuration (STT, LLM, TTS, VAD)
Pipeline node implementations (customizable processing stages)
Lifecycle callbacks (on_enter, on_exit, on_user_turn_completed)

Sources: livekit-agents/livekit/agents/voice/agent.py34-89

Plugin System

Plugins implement provider-specific integrations:

Abstract base classes (llm.LLM, stt.STT, tts.TTS, llm.RealtimeModel)
Streaming abstractions (LLMStream, RecognizeStream, SynthesizeStream)
Provider implementations (OpenAI, Google, Anthropic, AWS, Deepgram, etc.)
Connection management and error handling

Sources: livekit-agents/livekit/agents/__init__.py23

Request Flow Example

The following diagram illustrates how a typical voice interaction flows through the system:

Sources: livekit-agents/livekit/agents/voice/agent_activity.py778-803 livekit-agents/livekit/agents/voice/agent_activity.py856-958 livekit-agents/livekit/agents/voice/generation.py57-180

Minimal Example

Here's a minimal working example showing the main API surface:

Class/Function Mapping:

Concept	Code Entity	Location
Worker	`AgentServer`	livekit-agents/livekit/agents/worker.py256
Entrypoint decorator	`@server.rtc_session()`	livekit-agents/livekit/agents/worker.py430-471
Job execution context	`JobContext`	livekit-agents/livekit/agents/job.py131
Room connection	`ctx.connect()`	livekit-agents/livekit/agents/job.py415-445
Room object	`ctx.room`	livekit-agents/livekit/agents/job.py331-337
CLI runner	`cli.run_app()`	livekit-agents/livekit/agents/cli/cli.py

Sources: examples/minimal_worker.py1-23 livekit-agents/livekit/agents/worker.py430-471 livekit-agents/livekit/agents/job.py131-445

Voice Agent Example

Here's a complete voice agent example showing the key components:

Sources: livekit-agents/livekit/agents/voice/agent.py34-201 livekit-agents/livekit/agents/voice/agent_session.py471-693

Two Operating Modes

LiveKit Agents supports two fundamentally different operating modes:

Pipeline Mode

Uses separate components for each stage:

STT: Converts speech to text
VAD: Detects speech boundaries (optional)
LLM: Generates text responses with function calling
TTS: Synthesizes speech from text

Characteristics:

Maximum flexibility and provider choice
Explicit control over each pipeline stage
Turn detection via VAD, STT, or manual control
Supports any combination of providers

Code: The pipeline is orchestrated by AgentActivity which manages the flow through stt_node(), llm_node(), and tts_node().

Sources: livekit-agents/livekit/agents/voice/agent_activity.py105-237 livekit-agents/livekit/agents/voice/generation.py57-297

Realtime Mode

Uses a single multimodal model that handles both audio input and output:

RealtimeModel: Single model for speech-to-speech (e.g., OpenAI Realtime API, Google Gemini Live)

Characteristics:

Lower latency (no STT/TTS overhead)
Server-side turn detection
Native audio understanding
More limited provider options

Code: When Agent.llm is a RealtimeModel, AgentActivity uses RealtimeSession instead of the pipeline.

Sources: livekit-agents/livekit/agents/voice/agent_activity.py553-593 livekit-agents/livekit/agents/llm/realtime.py

Plugin Architecture

The plugin system decouples the framework from specific AI providers:

Sources: livekit-agents/livekit/agents/llm/__init__.py livekit-agents/livekit/agents/stt/__init__.py livekit-agents/livekit/agents/tts/__init__.py

Job Lifecycle

The complete lifecycle of an agent job:

Phase	Component	Key Methods/Events
Assignment	`AgentServer`	Receives job from LiveKit server
Process Launch	`JobProcess`	Spawns new process or uses idle process
Initialization	`JobContext`	`entrypoint_fnc()` called with `JobContext`
Room Connection	`JobContext`	`ctx.connect()` joins LiveKit room
Agent Start	`AgentSession`	`session.start(agent)` initializes voice pipeline
Agent Enter	`Agent`	`agent.on_enter()` callback
Conversation	`AgentActivity`	Audio I/O, turn detection, generation
Agent Exit	`Agent`	`agent.on_exit()` callback
Shutdown	`JobContext`	Shutdown callbacks, cleanup
Process Cleanup	`JobProcess`	Process terminated or returned to pool

Sources: livekit-agents/livekit/agents/worker.py480-755 livekit-agents/livekit/agents/job.py131-304 livekit-agents/livekit/agents/voice/agent_session.py471-693

Development and Deployment

Development Mode

The framework provides several development tools:

Console Mode: Run agents locally with microphone/speaker I/O via lk-agents console
Hot Reload: Automatically reload code changes without restarting jobs via --reload
Session Recording: Capture audio and conversation history to files
Local Testing: Test agents without connecting to LiveKit server

Sources: livekit-agents/livekit/agents/cli/cli.py252-453 livekit-agents/livekit/agents/cli/watcher.py1-50

Production Deployment

For production:

Process Pool: Pre-warm processes for faster job startup
Load Management: Automatic capacity monitoring and job assignment
Telemetry: OpenTelemetry traces and Prometheus metrics
Cloud Integration: Session reports uploaded to LiveKit Cloud
Health Checks: HTTP endpoints for health monitoring

Sources: livekit-agents/livekit/agents/worker.py560-693 livekit-agents/livekit/agents/telemetry/traces.py117-238

Next Steps

For more detailed information:

Architecture deep-dive: See System Architecture for detailed component interactions
Building agents: See Getting Started for step-by-step tutorials
Core concepts: See Core Concepts for terminology and design patterns
Voice pipeline: See Voice Agent Framework for conversation management
Worker configuration: See Worker and Job Management for deployment options

LiveKit Agents Overview

Purpose and Scope

What is LiveKit Agents?

Core Architecture Components

Component Responsibilities

AgentServer

JobContext

AgentSession

Agent

Plugin System

Request Flow Example

Minimal Example

Voice Agent Example

Two Operating Modes

Pipeline Mode

Realtime Mode

Plugin Architecture

Job Lifecycle

Development and Deployment

Development Mode

Production Deployment

Next Steps

On this page

LiveKit Agents Overview

Purpose and Scope

What is LiveKit Agents?

Core Architecture Components

Component Responsibilities

AgentServer

JobContext

AgentSession

Agent

Plugin System

Request Flow Example

Minimal Example

Voice Agent Example

Two Operating Modes

Pipeline Mode

Realtime Mode

Plugin Architecture

Job Lifecycle

Development and Deployment

Development Mode

Production Deployment

Next Steps

On this page