Skip to content

chand1012/parakeet-server

Repository files navigation

parakeet-server

A self-hosted, OpenAI-compatible speech-to-text API powered by NVIDIA's Parakeet TDT 0.6B v3 model.

Point Whisper-compatible clients at this server and keep everything local.

GHCR Rocket License

Why this project

  • OpenAI Whisper API compatible endpoint: POST /v1/audio/transcriptions
  • Automatic model download and local caching on first use
  • Multiple response formats: json, text, srt, vtt, verbose_json
  • Built-in browser UI at http://localhost:8000
  • Docker-friendly deployment and Rust-native runtime

Quick start

Option 1: Run from GHCR image

docker pull ghcr.io/chand1012/parakeet-server:latest
docker run --rm -it \
  -p 8000:8000 \
  -v "$(pwd)/models:/home/parakeet/models" \
  ghcr.io/chand1012/parakeet-server:latest

Option 2: Build and run locally

Prerequisites: Rust 1.93+, cmake, protobuf-compiler, pkg-config, OpenSSL dev libs.

cargo build --release
./target/release/parakeet-server

Server starts on http://localhost:8000.

Docker Compose (modern)

name: parakeet

services:
  app:
    image: ghcr.io/chand1012/parakeet-server:latest
    pull_policy: always
    init: true
    container_name: parakeet-server
    restart: unless-stopped
    ports:
      - "8000:8000"
    environment:
      ROCKET_PORT: "8000"
      RUST_LOG: info
    volumes:
      - ./models:/home/parakeet/models

Then run:

docker compose up -d

API usage

Basic transcription

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.mp3 \
  -F model=whisper-1

Optional form fields

Field Type Required Notes
file file Yes Audio upload (max 100 MB)
model string No Default: whisper-1
response_format string No json (default), text, srt, vtt, verbose_json
language string No Default: en

Response examples

json:

{ "text": "Hello, world." }

verbose_json:

{
  "task": "transcribe",
  "language": "en",
  "duration": 1.5,
  "text": "Hello, world.",
  "segments": [
    { "id": 0, "seek": 0, "start": 0.0, "end": 1.5, "text": "Hello, world." }
  ]
}

srt:

1
00:00:00,000 --> 00:00:01,500
Hello, world.

OpenAI client compatibility

Works with standard OpenAI SDKs by overriding base_url:

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="http://localhost:8000/v1",
)

with open("audio.mp3", "rb") as f:
    result = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
    )

print(result.text)

Browser UI

Open http://localhost:8000 and upload an audio file to test transcription in the built-in web interface.

Model and processing details

  • Model: NVIDIA Parakeet TDT 0.6B v3 (INT8)
  • Runtime: ONNX Runtime via transcribe-rs
  • Audio decoding: Symphonia and ffmpeg fallback path
  • Target sample rate: 16 kHz mono
  • Model cache path: models/parakeet-tdt-0.6b-v3-int8/

On first use, the model archive is fetched automatically from:

  • https://blob.handy.computer/parakeet-v3-int8.tar.gz

Development

# build
cargo build
cargo build --release

# test
cargo test
cargo test <test_function_name>

# lint / format
cargo fmt --check
cargo fmt
cargo clippy

There is also a helper script for manual endpoint testing:

bash test_transcribe.sh path/to/audio.mp3

Benchmarking transcription performance

Benchmarks throughput using the OpenSLR SLR81 test set with configurable parallel workers:

# Download test audio samples (~2.5GB)
./download_samples.sh

# Run benchmark (default: 4 parallel workers)
parakeet-bench

# Custom parallelism
parakeet-bench -j 8

Reports files processed, average duration, and throughput (files/second).

License

MIT. See LICENSE.

About

Warning: Extra Vibe coded.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors