Inception (@_inception

Inception

244 posts

Inception

@_inception_ai

Pioneering a new generation of LLMs.

Joined February 2025

Pinned
Inception
@_inception_ai
Apr 23
Mercury 2 is in a league of its own. 1,200 tok/s at comparable quality to speed-optimized autoregressive models, per @ArtificialAnlys.
14K
Inception
@_inception_ai
23h
We've been named to the @wef's 2026 Technology Pioneers community. Diffusion changed image generation, and Mercury 2 is doing the same for text and code. More from @StefanoErmon below.
Stefano Ermon
@StefanoErmon
23h
Honored that @_inception_ai has been named to the @wef's 2026 Technology Pioneers community. Diffusion reshaped image generation. With Mercury 2 we're bringing that leap to text and code. Grateful to the team and everyone who backed us early.
1.1K
Inception reposted
Stefano Ermon
@StefanoErmon
Jun 10
Excited to see Mercury 2 recognized by @ArtificialAnlys as the fastest model. Autoregressive models generate one token at a time, while diffusion LLMs refine many tokens in parallel. Mercury 2 shows what this unlocks in practice. artificialanalysis.ai/models
3K
Inception
@_inception_ai
Jun 9
Autoregressive models generate text one token at a time. That sequential process becomes a major bottleneck at inference scale with: -memory-bound workloads -poor GPU utilization -growing infrastructure demands Diffusion LLMs work differently. Instead of generating tokens one
00:00
2.3K
Inception
@_inception_ai
Jun 2
The question is no longer just which model is the smartest. It’s which model is most efficient without sacrificing quality. The highest-volume AI workloads are bottlenecked by latency, token generation speed, and serving cost. Autoregressive models were not designed for that
00:00
3.5K
Inception reposted
Stefano Ermon
@StefanoErmon
May 27
Replying to @DavidSHolz
That’s exactly the bet we’re making at @_inception_ai We’re already matching speed-optimized models from frontier labs on quality, while being faster and more cost efficient. That gap will only widen as we continue to scale.
9.2K
Inception
@_inception_ai
May 26
Hiring our first Forward Deployed AI Engineer at Inception. We built the world's fastest reasoning LLM and the first commercially available diffusion LLM, Mercury 2. >1,000 tokens/sec on standard GPUs via diffusion, 10x faster than speed-optimized autoregressive models at
9.7K
Inception
@_inception_ai
May 22
Will the next decade of LLMs run on autoregression, or on diffusion? One of the top questions we got at MLSys this week. Part 6, the final part of our founder story series with @timt at @MenloVentures. Featuring @StefanoErmon, @adityagrover_, @volokuleshov
00:00
8.2K
Inception
@_inception_ai
May 19
Day 2 at @MLSysConf. Thanks to everyone who came by yesterday. The conversations on diffusion for language, the future of language models, and what fast inference unlocks have been the highlight. Come find us at the booth today and meet the team behind Mercury 2. And join us
2.4K
Inception
@_inception_ai
May 18
We're at @MLSysConf in Seattle! Catch our co-founder and Chief Scientist @volokuleshov on stage today at 2:30pm. Learn more about diffusion LLMs and how Mercury 2 hits >1,000 tok/s on standard GPUs, at comparable quality to speed-optimized autoregressive models. Swing by the
7.3K
Inception
@_inception_ai
May 15
Today's autoregressive models generate one token at a time. Mercury 2 generates tokens in parallel. Over 1,000 tok/sec on standard GPUs, at comparable quality to speed-optimized models. Since launch, the community has been showing what diffusion LLMs can unlock. Thanks to the
00:00
21K
Inception
@_inception_ai
May 14
Replying to @_inception_ai
RSVP here:
Inception @ MLSys 2026: Networking Reception · Luma
From luma.com
747
Inception
@_inception_ai
May 14
Inception is heading to #MLSys2026 in Seattle next week. Two things worth your time: 1️⃣ Mon 5/18 at 2pm: lightning talk from @volokuleshov, co-founder of Inception. Come hear about a new generation of training and inference for diffusion-based language models. 2️⃣ Tues 5/19
3.3K