We've been named to the @wef's 2026 Technology Pioneers community.
Diffusion changed image generation, and Mercury 2 is doing the same for text and code. More from @StefanoErmon below.
Honored that @_inception_ai has been named to the @wef's 2026 Technology Pioneers community.
Diffusion reshaped image generation. With Mercury 2 we're bringing that leap to text and code. Grateful to the team and everyone who backed us early.
Excited to see Mercury 2 recognized by @ArtificialAnlys as the fastest model.
Autoregressive models generate one token at a time, while diffusion LLMs refine many tokens in parallel.
Mercury 2 shows what this unlocks in practice.
artificialanalysis.ai/models
Autoregressive models generate text one token at a time.
That sequential process becomes a major bottleneck at inference scale with:
-memory-bound workloads
-poor GPU utilization
-growing infrastructure demands
Diffusion LLMs work differently.
Instead of generating tokens one
The question is no longer just which model is the smartest.
It’s which model is most efficient without sacrificing quality.
The highest-volume AI workloads are bottlenecked by latency, token generation speed, and serving cost. Autoregressive models were not designed for that
That’s exactly the bet we’re making at @_inception_ai
We’re already matching speed-optimized models from frontier labs on quality, while being faster and more cost efficient. That gap will only widen as we continue to scale.
Hiring our first Forward Deployed AI Engineer at Inception.
We built the world's fastest reasoning LLM and the first commercially available diffusion LLM, Mercury 2.
>1,000 tokens/sec on standard GPUs via diffusion, 10x faster than speed-optimized autoregressive models at
Will the next decade of LLMs run on autoregression, or on diffusion?
One of the top questions we got at MLSys this week.
Part 6, the final part of our founder story series with @timt at @MenloVentures.
Featuring @StefanoErmon, @adityagrover_, @volokuleshov
Day 2 at @MLSysConf.
Thanks to everyone who came by yesterday. The conversations on diffusion for language, the future of language models, and what fast inference unlocks have been the highlight.
Come find us at the booth today and meet the team behind Mercury 2. And join us
We're at @MLSysConf in Seattle!
Catch our co-founder and Chief Scientist @volokuleshov on stage today at 2:30pm.
Learn more about diffusion LLMs and how Mercury 2 hits >1,000 tok/s on standard GPUs, at comparable quality to speed-optimized autoregressive models.
Swing by the
Today's autoregressive models generate one token at a time.
Mercury 2 generates tokens in parallel. Over 1,000 tok/sec on standard GPUs, at comparable quality to speed-optimized models.
Since launch, the community has been showing what diffusion LLMs can unlock. Thanks to the
Inception is heading to #MLSys2026 in Seattle next week.
Two things worth your time:
1️⃣ Mon 5/18 at 2pm: lightning talk from @volokuleshov, co-founder of Inception. Come hear about a new generation of training and inference for diffusion-based language models.
2️⃣ Tues 5/19