Doubleword
    ​Now live!

    What would you build if inference was 100x cheaper?

    Run massive, asynchronous LLM workloads at wholesale compute prices.

    Process your first 20,000,000 tokens for free.

    Doubleword pipeline
    01
    input
    ,,,,,
    02
    inference
    03
    output
    ,,,,,
    04
    then
    ,,,

    The async worker for your AI pipelines.

    Built for background queues, nightly cron jobs, and massive offline ETL pipelines. Don't block your user's session with a synchronous LLM call.

    Explore all workbooks

    Great for

    Async AgentsSynthetic Data GenerationData Processing PipelinesEmbeddingsModel EvalsBug Detection EnsembleDataset CompilationStructured ExtractionImage SummarizationAI Personal AssistantsAsynchronous Event ListenersETL & Pipeline Sanitization
    Not for:
    Chat UIsLatency-critical paths

    Choose when you want results

    Trade latency for cost. Pick the window that fits your workflow.

    batch_priority.py
    from openai import OpenAI
    
    client = OpenAI(
        api_key="{{apiKey}}",
        base_url="https://api.doubleword.ai/v1/"
    )
    
    batch_input_file_id = batch_input_file.id
    
    result = client.batches.create(
        input_file_id=batch_input_file_id,
        endpoint="/v1/chat/completions",
        completion_window="priority",
        metadata={
            "description": "Structure Extraction"
        }
    )
    
    print(result)

    Results stream as they are ready. No need to wait for completion!

    Per-token pricing

    Same Intelligence. Fraction of the price.

    Cost to process 1 billion tokens in + 1 billion tokens out at comparable intelligence.

    Model
    Anthropic
    $30K
    OpenAI
    $15.8K
    Industry Average
    $2.7K
    Doubleword (Async)
    $2.1K
    $0$7.5K$15K$22.5K$30K

    Intelligence via Artificial Analysis Index v4.0 · Hover any bar for full pricing details · Want access to a model you don't see here — just ask us!

    No credit card required · No minimum spend · Pay only for tokens used

    Start building

    Built for your highest volume use cases

    Async Agents

    Autonomous AI workflows that run without human intervention.

    Classification

    Categorize, label, and detect patterns in your data.

    Data Processing

    Clean, transform, and prepare data at scale.

    Data Enrichment

    Augment datasets with additional context and metadata.

    Embeddings

    Convert text and data into vector representations.

    Image Processing

    Analyze, summarize, and extract insights from images.

    Model Evals

    Benchmark and compare model performance systematically.

    Structured Generation

    Extract and format data into consistent schemas.

    Synthetic Data

    Generate realistic training and test datasets.

    Batch Inference Done Right

    Cheaper, Yes. Better, Definitely.

    Batch-first infrastructure with async SLAs, live streaming results, and predictable economics.

    COST

    Up to 75% lower cost

    Compared with real-time inference for async, high-throughput jobs.

    SPEED CONTROL

    Priority SLAs

    Choose the completion window your job needs, from 1 hour to overnight.

    RELIABILITY

    SLA-backed guarantee

    Miss the SLA, that job is free. We take this very seriously.

    STREAMING

    Live streaming results

    Process outputs as they complete and keep downstream pipelines moving.

    COMPATIBILITY

    One-line migration

    OpenAI-compatible endpoint, minimal code changes to switch.

    /RESPONSES API

    Tool calling + structured outputs

    Built on the /responses endpoint, so you have everything you need to build agent workflows.

    Seen in the wild

    Community Love — From the smallest side projects to the biggest workloads.

    View all

    Used by:

    Applied ML • Data Platform • LLM Infrastructure • Research Engineering

    Got questions?

    Questions, answered honestly

    No marketing speak. Just straight answers.

    New — CLI

    Meet dw — your terminal for batch inference

    Upload files, run batches, stream results, and send real-time inference — all from the terminal. Replaces curl commands and custom scripts with a single tool.

    terminal

    # Install

    $ curl -fsSL https://raw.githubusercontent.com/doublewordai/dw/main/install.sh | sh

    # Or via pip

    $ pip install --user dw-cli

    # Get started

    $ dw login

    $ dw stream batch.jsonl > results.jsonl

    Shipping Speed

    Select your delivery preference

    * All tiers stream results as they're ready. No waiting for full completion.

    Stop overpaying for inference.

    Run your background agents and workloads at a fraction of the price and double the scale.

    If you can wait an hour, you can save a lot.