A few days ago Kimi K2 Thinking significantly narrowed the capability gap between open and closed LLMs. Today Baseten is the only provider to deliver over 100 tok/sec on this massive 1T-parameter model.
Co-founder @baseten
- We closed our series D at $2.1b. It happened 8 months after our series C, which seems too fast until you consider the facts: 2-years worth of growth in 8 months, virtually 0 customer churn, healthy margins, and QoQ NDR numbers that are considered top-tier YoY. The market demand
- Today we announced our $75m series C after growing revenue 6x in a year. But this milestone seemed impossible 3 years ago. This post is mostly about that. Baseten is 5.5 years old. The company truly wasn’t working for the first 3 years. Even back then we were an ML infra
- We launched Blueprint today: an easy way for engineers to fine-tune and serve open source foundation models. 🧵
- It's important to support newly released open-weight models on day 1. But it's not noteworthy. What's noteworthy is to have the inference optimization muscle to immediately blow the competition out of water on latency and throughput. As measured by OpenRouter:
- "Where are your GPUs?" I get this question on sales calls. The answer is 10 different public clouds in 40+ regions. The hard part wasn't acquiring compute; it was using them dynamically to scale a single model across the world. It took us time to build, but the gains are worth
- And this tweet is the reason why:"Where are your GPUs?" I get this question on sales calls. The answer is 10 different public clouds in 40+ regions. The hard part wasn't acquiring compute; it was using them dynamically to scale a single model across the world. It took us time to build, but the gains are worth
- There's an obsession with tok/sec as *the* metric in LLM inference. But in latency-sensitive use cases the metic that matters more is time-to-first-token: - Code edit use cases have short outputs and overall latency is heavily determined by ttft - Voice AI use cases care about
- Product launch with the backstory: Internally we had always said let's do *1 thing* but do it well. For us that was inference. And we said at some point we'll earn the rights to expand the surface area beyond that. That some point is today. The vast majority of our revenue
- Replying to @FactoryAIDeploy and serve custom models with enterprise-grade infrastructure on @baseten. Special promo for Factory users: receive $500 Model API credits when you fill out this form. baseten.co/talk-to-us/fac…
- Announcement time: today anyone can use @baseten. During the past 2 years we've been busy: ◾Building the product we wish we had in our previous jobs ◾Onboarding customers, getting them to tangible value, and iterating 🧵
- Many highly requested ML infra features are packed in 1 screenshot. I'll unpack some:
- Quick story: our customers kept telling us: "Baseten has the inference layer covered; great. But our workflow isn't 'call a single model and run with the result'. We need to call a series of custom models. And calling model-after-model is adding a) latency, b) egress cost, and
00:00 - 🎄🎅 Something fun for holiday family gatherings: restoring old photos. Here's one with my mom and I :) You can try it: app.baseten.co/applications/Q…. It's a @baseten app + GFP GAN.











