Introduction

Persona is an infrastructure platform for configuring and deploying state-of-the-art conversational speech AI personas using LLMs as their backbone.

Persona features industry-leading low latency, turn-taking, interruption handling, and steerability.

If you have not yet interacted with a Persona, you can speak with one here.

Philosophy

Persona strives to be minimally opinionated about the implementation of speech-based AI Personas, aside from a few strongly-held axioms:

Realtime spoken conversation with an LLM is entirely different from text-based conversation and requires different tradeoffs. Namely, speed versus intelligence, where speed takes priority.
Personas should respond at human-like latencies when appropriate.
The primary conversation loop should never be blocked.

From these axions arise a number of constraints that do not apply to text-based conversational AI.

For example:

Traditional RAG approaches using third-party providers do not work because of their high latency. Retrievals can take over 1000 milliseconds (1 second) with mainstream vector database providers, which ruins the realtime conversational experience. Thus, Persona implements its own RAG system which runs full retrievals in milliseconds.
Chain-of-Thought prompting can improve the quality of LLM responses and function-calling, but adds unacceptable latency to response times. Thus, it is not (yet) supported in the Persona platform, until LLM inference speeds enable imperceptibly fast CoT processing.

Getting Started

Personas can be configured in the Persona Webapp or in code. If they are configured in code, their configurations must be submitted at the beginning of each new conversation.

Philosophy

Getting Started

On this page