This document provides an overview of the LiveKit Agents framework, explaining its purpose, core architecture, and how the major components work together to enable real-time voice AI applications. For detailed information on specific subsystems, see:
LiveKit Agents is a Python framework for building real-time voice AI agents that connect to LiveKit rooms via WebRTC. It provides:
The framework handles the complex orchestration of real-time audio streaming, turn detection, interruption handling, and conversation state management, allowing developers to focus on agent behavior and business logic.
Sources: livekit-agents/livekit/agents/__init__.py15-19 livekit-agents/livekit/agents/worker.py1-257
The following diagram shows the main classes and their relationships in the LiveKit Agents framework:
Architecture Overview: The framework follows a three-tier architecture. The Application Layer contains user code (entrypoint functions and Agent subclasses). The Core Classes provide orchestration and voice pipeline management. The Plugin System implements provider-specific integrations. External services handle WebRTC connectivity and AI inference.
Sources: livekit-agents/livekit/agents/worker.py256-472 livekit-agents/livekit/agents/job.py131-184 livekit-agents/livekit/agents/voice/agent_session.py136-367 livekit-agents/livekit/agents/voice/agent_activity.py105-166
The AgentServer class is the main worker process that:
Sources: livekit-agents/livekit/agents/worker.py256-354
The JobContext provides the execution environment for each job:
rtc.Room object)Sources: livekit-agents/livekit/agents/job.py131-246
The AgentSession is the voice agent runtime that:
Sources: livekit-agents/livekit/agents/voice/agent_session.py136-367
The Agent class defines agent behavior:
on_enter, on_exit, on_user_turn_completed)Sources: livekit-agents/livekit/agents/voice/agent.py34-89
Plugins implement provider-specific integrations:
llm.LLM, stt.STT, tts.TTS, llm.RealtimeModel)LLMStream, RecognizeStream, SynthesizeStream)Sources: livekit-agents/livekit/agents/__init__.py23
The following diagram illustrates how a typical voice interaction flows through the system:
Request Flow: Audio flows from user → room → AgentSession → AgentActivity, where it's transcribed and turn detection occurs. After end-of-turn, the agent generates a reply via LLM, executes any tool calls in parallel, and synthesizes speech via TTS. Audio output flows back through AgentSession → room → user.
Sources: livekit-agents/livekit/agents/voice/agent_activity.py778-803 livekit-agents/livekit/agents/voice/agent_activity.py856-958 livekit-agents/livekit/agents/voice/generation.py57-180
Here's a minimal working example showing the main API surface:
Class/Function Mapping:
| Concept | Code Entity | Location |
|---|---|---|
| Worker | AgentServer | livekit-agents/livekit/agents/worker.py256 |
| Entrypoint decorator | @server.rtc_session() | livekit-agents/livekit/agents/worker.py430-471 |
| Job execution context | JobContext | livekit-agents/livekit/agents/job.py131 |
| Room connection | ctx.connect() | livekit-agents/livekit/agents/job.py415-445 |
| Room object | ctx.room | livekit-agents/livekit/agents/job.py331-337 |
| CLI runner | cli.run_app() | livekit-agents/livekit/agents/cli/cli.py |
Sources: examples/minimal_worker.py1-23 livekit-agents/livekit/agents/worker.py430-471 livekit-agents/livekit/agents/job.py131-445
Here's a complete voice agent example showing the key components:
Voice Agent Flow: An Agent subclass defines behavior (instructions, tools, models). AgentSession manages the runtime, orchestrating the pipeline: audio input → STT → turn detection → LLM → tool execution → TTS → audio output.
Sources: livekit-agents/livekit/agents/voice/agent.py34-201 livekit-agents/livekit/agents/voice/agent_session.py471-693
LiveKit Agents supports two fundamentally different operating modes:
Uses separate components for each stage:
Characteristics:
Code: The pipeline is orchestrated by AgentActivity which manages the flow through stt_node(), llm_node(), and tts_node().
Sources: livekit-agents/livekit/agents/voice/agent_activity.py105-237 livekit-agents/livekit/agents/voice/generation.py57-297
Uses a single multimodal model that handles both audio input and output:
Characteristics:
Code: When Agent.llm is a RealtimeModel, AgentActivity uses RealtimeSession instead of the pipeline.
Sources: livekit-agents/livekit/agents/voice/agent_activity.py553-593 livekit-agents/livekit/agents/llm/realtime.py
The plugin system decouples the framework from specific AI providers:
Plugin Pattern: Abstract base classes define interfaces that return streaming objects. Provider plugins implement these interfaces using provider-specific protocols. The framework operates against the abstract interfaces, making providers interchangeable.
Sources: livekit-agents/livekit/agents/llm/__init__.py livekit-agents/livekit/agents/stt/__init__.py livekit-agents/livekit/agents/tts/__init__.py
The complete lifecycle of an agent job:
| Phase | Component | Key Methods/Events |
|---|---|---|
| Assignment | AgentServer | Receives job from LiveKit server |
| Process Launch | JobProcess | Spawns new process or uses idle process |
| Initialization | JobContext | entrypoint_fnc() called with JobContext |
| Room Connection | JobContext | ctx.connect() joins LiveKit room |
| Agent Start | AgentSession | session.start(agent) initializes voice pipeline |
| Agent Enter | Agent | agent.on_enter() callback |
| Conversation | AgentActivity | Audio I/O, turn detection, generation |
| Agent Exit | Agent | agent.on_exit() callback |
| Shutdown | JobContext | Shutdown callbacks, cleanup |
| Process Cleanup | JobProcess | Process terminated or returned to pool |
Sources: livekit-agents/livekit/agents/worker.py480-755 livekit-agents/livekit/agents/job.py131-304 livekit-agents/livekit/agents/voice/agent_session.py471-693
The framework provides several development tools:
lk-agents console--reloadSources: livekit-agents/livekit/agents/cli/cli.py252-453 livekit-agents/livekit/agents/cli/watcher.py1-50
For production:
Sources: livekit-agents/livekit/agents/worker.py560-693 livekit-agents/livekit/agents/telemetry/traces.py117-238
For more detailed information:
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.