Doubleword blog

01March 23, 2026/Jamie Dborin

OCR and the Bitter Lesson

Recently released open source OCR models are starting to replace expert-based OCR systems. This post walks through an evaluation exercise of specialist and general OCR agents.

02January 22, 2026/Jamie Dborin

QueueSpec: Drafting While You Wait

Speculative decoding speeds up LLM generation by letting a system propose several “draft” tokens at once, and then having the target model verify them in a single forward pass. The usual question is: where do we get good drafts cheaply? In this post, we explore queue speculation (QueueSpec): draft tokens come from a smaller model that runs while a request is queuing, so verification can start immediately once the request is serviced. At doubleword we use speculative decoding techniques like this and other throughput-specific optimizations to deliver cheaper inference at scale, by sacrificing end to end latency. If you want to get started with some free credits sign up here: Doubleword Platform

03January 22, 2026/Meryem Arik

Price Reduction for Qwen3-235B on Doubleword

Today we’re reducing the price of our highest-intelligence model, Qwen3-235B-A22B-Instruct.

04January 16, 2026/Fergus Finn

Scaling Curation with LLM Comparisons

Building a content discovery system using parallel primitives and BST-based ranking with LLM comparisons

05January 13, 2026/Fergus Finn

LLM powered data structures: A lock-free binary search tree

A lock-free binary search tree optimized for expensive async comparisons, with threaded linked list for O(1) sorted iteration

06January 8, 2026/Jamie Dborin

ZeroDP: Just-In-Time Weight Offloading over NVLink for Data Parallelism

High throughput inference of LLMs using JIT weight offloading to optimize KV Cache.

07January 1, 2026/Fergus Finn

Large-Scale Semantic Search Without Embeddings

Applying parallel primitives to search and rank 2.4 million arXiv papers using LLM judgments

08December 31, 2025/Fergus Finn

Parallel Primitives for Multi-Agent Workflows

Exploring coordination patterns from parallel computing for multi-agent LLM systems

09December 15, 2025/Amanda Milberg

$1 for a Year of Research Digests. That's Less Than a Coffee.

Researchers face an impossible task in staying up to date within their field. In AI and Machine Learning alone, arXiv publishes 50-100 new papers daily. Multiply that across computer science, physics, biology, and other domains, and you're looking at hundreds of potentially relevant papers flooding in every single day.

10December 8, 2025/Amanda Milberg

Why Batch Inference Matters: Moving from AI Assistants to Autonomous Agents

The initial wave of Generative AI adoption focused on augmenting human work - chatbots that help developers write cleaner code, assistants that polish our emails, or tools that speed up content creation. These productivity enhancements have proven their value tenfold, as almost every individual has a version of ChatGPT open to assist them during their day. But they represent just the beginning of what's possible with AI.

11December 3, 2025/Jamie Dborin

Behind the Stack, Ep 13: Faster Inference: Speculative Decoding for Batched Workloads

This episode explores how speculative decoding becomes increasingly valuable in high-throughput, batched inference scenarios, particularly with sparse MoE architectures.

12November 19, 2025/Jamie Dborin

Behind the Stack, Ep 12: Understanding Model Parallelism

This technical guide explores model parallelism, a critical technique for deploying large language models that exceed single GPU memory capacity.