PyAI quickstart: first key, Speak WAV, and Omni agent

PyAI is telephony-native Voice AI behind one API key: Hear (speech-to-text), Speak (text-to-speech + cloning), Cue (turn detection + knowledge-base context for your own pipeline), and Omni — the all-in-one voice agent model: a hybrid speech-to-speech engine with a fused LLM brain, tool calling, knowledge-base grounding, and emotion-aware voices, all over one WebSocket. The API is OpenAI-compatible at https://api.pyai.com/v1. HTTP auth accepts either Authorization: Bearer <key> or the header alias x-api-key: <key>.

Fastest start: scaffold a complete, runnable example in one command — no clone, no setup ceremony.

npm create pyai-app@latest                   # pick an example interactively
npm create pyai-app@latest openai-drop-in     # already on OpenAI? migrate by changing the base URL

Browse them all at github.com/atomsai/pyai-examples.

Get a key

Sign up at console.pyai.com, click Create API key, and copy it (shown once). Use a pyai_test_ sandbox key to start — it works instantly with hard daily caps and no billing.

export PYAI_API_KEY=pyai_test_...

Verify it in 5 seconds

GET /v1/me needs no special scope, so it’s the fastest possible first call — and it’s self-diagnosing: it echoes back the org, environment, granted scopes, and credit posture the gateway resolved for your key.

curl https://api.pyai.com/v1/me -H "Authorization: Bearer $PYAI_API_KEY"

A 200 means your key is live everywhere. A 401 means the key is wrong; a 402 is the billing gate (use a pyai_test_ key) — never a broken key.

Synthesize speech

curl https://api.pyai.com/v1/audio/speech \
  -H "Authorization: Bearer $PYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"pyai-voice","input":"Hello from PyAI.","voice":"stock_emma_en_gb"}' \
  --output hello.wav

List voices

curl https://api.pyai.com/v1/voices -H "Authorization: Bearer $PYAI_API_KEY"

Use any returned id as the voice above, or create your own with voice cloning (/v1/voice/clones). Migrating from OpenAI? The preset names alloy, echo, fable, onyx, nova, and shimmer work as drop-in aliases for PyAI stock voices, so existing code runs unchanged.

Talk to a realtime agent (Omni)

Omni is the whole voice agent in one model — it hears, reasons with a fused LLM brain, calls your tools, grounds answers in your knowledge base, and speaks back in emotion-aware voices. It’s zero-state: the session is authorized by your key’s org, so there’s nothing to create first — just open a WebSocket and pass the key as a subprotocol (browser-safe). One configure frame sets the whole agent for the session: voice_id, persona, kb_endpoint (grounding), and tools[] (function calling). Enable prebuilt hosted tools by name with zero setup — search_knowledge, web_search, weather, currency, unit_convert, math, datetime, geocode, news — or register your own webhook (Omni tools). session_label is an optional opaque tag echoed to your kb_endpoint.

import WebSocket from "ws";
const ws = new WebSocket(
  "wss://api.pyai.com/v1/omni?format=pcm16&rate=24000",
  [`pyai-key.${process.env.PYAI_API_KEY}`],
);
ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "configure",
    voice_id: "stock_ava_en_us",          // or your cloned / designed voice
    persona: "You are a friendly receptionist.",
    kb_endpoint: "https://your-app.example.com/kb", // grounding (optional)
    tools: [                                // function calling (optional)
      { name: "book_appointment", description: "Book a slot",
        parameters: { type: "object", properties: { time: { type: "string" } } } },
    ],
  }));
});
ws.on("message", (d) => console.log(d.toString())); // hello + session_started

Prefer the official SDKs — they handle auth, retries, idempotency, and realtime for you: npm install @pyai/sdk or pip install pyai-sdk.

Next steps

Build a browser voice agent

Mic → Omni → speakers in ~10 minutes, all client-side.

Omni tools & function calling

Give your agent real actions: order lookups, bookings, transfers.

Authentication

Keys, environments, rotation, revocation.

Pricing & metering

How usage is measured and billed.

Errors & limits

Error codes, rate limits, idempotency.

API reference

Full request/response schemas, right here in the docs.

Runnable examples

Copy-paste apps — OpenAI drop-in, voice cloning, telephony, call analytics. npm create pyai-app@latest.

​Next steps

Build a browser voice agent

Omni tools & function calling

Authentication

Pricing & metering

Errors & limits

API reference

Runnable examples

Next steps