Now live!

What would you build if inference was 100x cheaper?

Run massive, asynchronous LLM workloads at wholesale compute prices.

Run a sample job Get Started with AI

Process your first 20,000,000 tokens for free.

Doubleword pipeline

input

,,,,,

inference

output

,,,,,

then

,,,

The async worker for your AI pipelines.

Built for background queues, nightly cron jobs, and massive offline ETL pipelines. Don't block your user's session with a synchronous LLM call.

Explore all workbooks

Great for

Async AgentsSynthetic Data GenerationData Processing PipelinesEmbeddingsModel EvalsBug Detection EnsembleDataset CompilationStructured ExtractionImage SummarizationAI Personal AssistantsAsynchronous Event ListenersETL & Pipeline Sanitization

Not for:

Chat UIsLatency-critical paths

Choose when you want results

Trade latency for cost. Pick the window that fits your workflow.

batch_priority.py

from openai import OpenAI

client = OpenAI(
    api_key="{{apiKey}}",
    base_url="https://api.doubleword.ai/v1/"
)

batch_input_file_id = batch_input_file.id

result = client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="priority",
    metadata={
        "description": "Structure Extraction"
    }
)

print(result)

Results stream as they are ready. No need to wait for completion!

Per-token pricing

Same Intelligence. Fraction of the price.

Cost to process 1 billion tokens in + 1 billion tokens out at comparable intelligence.

Model

Anthropic

$30K

OpenAI

$15.8K

Industry Average

$2.7K

Doubleword (Async)

$2.1K

$0$7.5K$15K$22.5K$30K

Intelligence via Artificial Analysis Index v4.0 · Hover any bar for full pricing details · Want access to a model you don't see here — just ask us!

No credit card required · No minimum spend · Pay only for tokens used

Start building

Built for your highest volume use cases

Async Agents

Autonomous AI workflows that run without human intervention.

Classification

Categorize, label, and detect patterns in your data.

Data Processing

Clean, transform, and prepare data at scale.

Data Enrichment

Augment datasets with additional context and metadata.

Embeddings

Convert text and data into vector representations.

Image Processing

Analyze, summarize, and extract insights from images.

Model Evals

Benchmark and compare model performance systematically.

Structured Generation

Extract and format data into consistent schemas.

Synthetic Data

Generate realistic training and test datasets.

Batch Inference Done Right

Cheaper, Yes. Better, Definitely.

Batch-first infrastructure with async SLAs, live streaming results, and predictable economics.

COST

Up to 75% lower cost

Compared with real-time inference for async, high-throughput jobs.

SPEED CONTROL

Priority SLAs

Choose the completion window your job needs, from 1 hour to overnight.

RELIABILITY

SLA-backed guarantee

Miss the SLA, that job is free. We take this very seriously.

STREAMING

Live streaming results

Process outputs as they complete and keep downstream pipelines moving.

COMPATIBILITY

One-line migration

OpenAI-compatible endpoint, minimal code changes to switch.

/RESPONSES API

Tool calling + structured outputs

Built on the /responses endpoint, so you have everything you need to build agent workflows.

Seen in the wild

Community Love — From the smallest side projects to the biggest workloads.

View all

blog

SynthVision: 110K Synthetic Medical VQA Dataset with Cross-Model Validation

data-generation

image-processing

OpenMed × Hugging FaceRead

github

Dataset Generation to Train Privacy Proxy Model

data-generation

DataikuView

How I Processed 1,000 Invoices with a 235B LLM for Under $0.50

structured-extraction

data-processing

Shrinath Suresh

Batch Inference for Open-Source LLMs: Faster, Cheaper, Scalable

Shrinath Suresh

Agentic AI + Claude Code Automation

async-agents

Nnamdi OdoziRead

github

Batch Skill for Claude Code

async-agents

Nnamdi OdoziView

LLM Batch Inference in the Wild — 35 Research Papers

data-processing

Nnamdi OdoziRead

Computer Vision + OCR Batch Processing

image-processing

structured-extraction

Venkat KumarRead

Batch Inference Has a PR Problem

Konark ModiRead

Professor Layton Eval

model-evals

Raphael VienneRead

github

Drop-in AsyncOpenAI replacement that transparently batches requests

Fergus FinnView

github

QLM - Query Language for Models

Fergus FinnView

blog

Processed 1,000 Invoices with a 235B LLM for Under $0.50

structured-extraction

data-processing

Shrinath SureshRead

blog

The Joy of Batch Inference

Thomas DinsmoreRead

github

OpenClaw Skill for Doubleword

async-agents

Peter BhabraView

github

Batchling — Convert Any GenAI Async Function into Batch Jobs

async-agents

Raphael VienneView

github

UK Charity Document Extraction - Multi-Model Benchmark

model-evals

Mani SarkarView

The Async Inference Manifesto

async-agents

Tesseracted LabsRead

Used by:

Applied ML • Data Platform • LLM Infrastructure • Research Engineering

Got questions?

Questions, answered honestly

No marketing speak. Just straight answers.

New — CLI

Meet `dw` — your terminal for batch inference

Upload files, run batches, stream results, and send real-time inference — all from the terminal. Replaces curl commands and custom scripts with a single tool.

terminal

# Install

$ curl -fsSL https://raw.githubusercontent.com/doublewordai/dw/main/install.sh | sh

# Or via pip

$ pip install --user dw-cli

# Get started

$ dw login

$ dw stream batch.jsonl > results.jsonl

Read the docs View on GitHub

Shipping Speed

Select your delivery preference

* All tiers stream results as they're ready. No waiting for full completion.

Stop overpaying for inference.

Run your background agents and workloads at a fraction of the price and double the scale.

Run a sample job Talk to us

If you can wait an hour, you can save a lot.

What would you build if inference was 100x cheaper?

The async worker for your AI pipelines.

Great for

Choose when you want results

Dev Mode

Async Inference

Overnight Batch

Same Intelligence. Fraction of the price.

Built for your highest volume use cases

Async Agents

Classification

Data Processing

Data Enrichment

Embeddings

Image Processing

Model Evals

Structured Generation

Synthetic Data

Cheaper, Yes. Better, Definitely.

Up to 75% lower cost

Priority SLAs

SLA-backed guarantee

Live streaming results

One-line migration

Tool calling + structured outputs

Seen in the wild

Questions, answered honestly

Meet `dw` — your terminal for batch inference

Select your delivery preference

Stop overpaying for inference.

What would you build if inference was 100x cheaper?

The async worker for your AI pipelines.

Great for

Choose when you want results

Dev Mode

Async Inference

Overnight Batch

Same Intelligence. Fraction of the price.

Built for your highest volume use cases

Async Agents

Classification

Data Processing

Data Enrichment

Embeddings

Image Processing

Model Evals

Structured Generation

Synthetic Data

Cheaper, Yes. Better, Definitely.

Up to 75% lower cost

Priority SLAs

SLA-backed guarantee

Live streaming results

One-line migration

Tool calling + structured outputs

Seen in the wild

Questions, answered honestly

Why is it cheaper?

What happens if you can't meet the SLA?

Can I get results before the batch completes?

Is it secure?

What if a run fails?

Which models are supported?

Should I use this for my chat app?

Meet dw — your terminal for batch inference

Select your delivery preference

Stop overpaying for inference.

Meet `dw` — your terminal for batch inference