What would you build if inference was 100x cheaper?
Run massive, asynchronous LLM workloads at wholesale compute prices.
Process your first 20,000,000 tokens for free.
The async worker for your AI pipelines.
Built for background queues, nightly cron jobs, and massive offline ETL pipelines. Don't block your user's session with a synchronous LLM call.
Explore all workbooksGreat for
Choose when you want results
Trade latency for cost. Pick the window that fits your workflow.
from openai import OpenAI
client = OpenAI(
api_key="{{apiKey}}",
base_url="https://api.doubleword.ai/v1/"
)
batch_input_file_id = batch_input_file.id
result = client.batches.create(
input_file_id=batch_input_file_id,
endpoint="/v1/chat/completions",
completion_window="priority",
metadata={
"description": "Structure Extraction"
}
)
print(result)Results stream as they are ready. No need to wait for completion!
Same Intelligence. Fraction of the price.
Cost to process 1 billion tokens in + 1 billion tokens out at comparable intelligence.
Intelligence via Artificial Analysis Index v4.0 · Hover any bar for full pricing details · Want access to a model you don't see here — just ask us!
No credit card required · No minimum spend · Pay only for tokens used
Start buildingCheaper, Yes. Better, Definitely.
Batch-first infrastructure with async SLAs, live streaming results, and predictable economics.
Up to 75% lower cost
Compared with real-time inference for async, high-throughput jobs.
Priority SLAs
Choose the completion window your job needs, from 1 hour to overnight.
SLA-backed guarantee
Miss the SLA, that job is free. We take this very seriously.
Live streaming results
Process outputs as they complete and keep downstream pipelines moving.
One-line migration
OpenAI-compatible endpoint, minimal code changes to switch.
Tool calling + structured outputs
Built on the /responses endpoint, so you have everything you need to build agent workflows.
Used by:
Applied ML • Data Platform • LLM Infrastructure • Research Engineering
Questions, answered honestly
No marketing speak. Just straight answers.
Meet dw — your terminal for batch inference
Upload files, run batches, stream results, and send real-time inference — all from the terminal. Replaces curl commands and custom scripts with a single tool.
# Install
$ curl -fsSL https://raw.githubusercontent.com/doublewordai/dw/main/install.sh | sh
# Or via pip
$ pip install --user dw-cli
# Get started
$ dw login
$ dw stream batch.jsonl > results.jsonl
Select your delivery preference
* All tiers stream results as they're ready. No waiting for full completion.
Stop overpaying for inference.
Run your background agents and workloads at a fraction of the price and double the scale.
If you can wait an hour, you can save a lot.


