Zhuohan Li (@zhuohan123) / X

Zhuohan Li

247 posts

Zhuohan Li

@zhuohan123

building @vllm_project at @meta | ex-openai | cs phd @ 🌁 uc berkeley | machine learning system | the real agi is the friends we made along the way

San Francisco Bay Area

Joined January 2011

Zhuohan Li
@zhuohan123
Aug 5, 2025
I’ve been fortunate to lead the infra and inference work that brings gpt-oss to life. A year ago, I joined OpenAI after building vLLM from scratch — It’s deeply meaningful to now be on the other side of the release, helping share models back with the open-source community.
OpenAI
@OpenAI
Aug 5, 2025
Our open models are here. Both of them. openai.com/open-models
416K
Zhuohan Li
@zhuohan123
Jun 20, 2023
🌟 Thrilled to introduce vLLM with @woosuk_k! 🚀 vLLM is an open-source LLM inference and serving library that accelerates HuggingFace Transformers by 24x and powers @lmsysorg Vicuna and Chatbot Arena. Github: github.com/vllm-project/v… Blog: vllm.ai
GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for...
From github.com
398K
Zhuohan Li
@zhuohan123
Aug 5, 2025
Replying to @zhuohan123
And… the biggest lesson I learned along the way? huggingface-cli uploads to a public repo by default — so add --private if you’re not ready to show your work to the world just yet. 😅
12K
Zhuohan Li
@zhuohan123
Apr 19, 2025
Please stop attacking vLLM team’s integrity and being defensive about benchmarks involving SGLang. If you think the results are incorrect, please submit a PR to the open benchmark repo to correct the results.
51K
Zhuohan Li
@zhuohan123
Aug 5, 2025
Launching this model together with the amazing @vllm_project team was a real highlight for me! Follow this guide to launch gpt-oss in vLLM:
OpenAI
@OpenAI
Aug 5, 2025
Our open models are here. Both of them. openai.com/open-models
vLLM Now Supports gpt-oss
From vllm.ai
15K
Zhuohan Li
@zhuohan123
Aug 5, 2025
😎😎😎😎 Check out this cool demo 👇👇👇👇 presented by the me and the awesome @DKundel!!
OpenAI
@OpenAI
Aug 5, 2025
Want to see our open models in action? Watch how gpt-oss builds a video game—using tools step-by-step within chain-of-thought reasoning 👾🍓
00:00
38K
Zhuohan Li
@zhuohan123
Feb 24, 2023
Unlock the full potential of model parallelism with AlpaServe 🚀: Besides scaling models beyond one GPU, our new paper shows that model parallelism can process NN serving requests 10x faster even if the models fit into 1 GPU! Paper: arxiv.org/abs/2302.11665 👇 [1/8]
21K
Zhuohan Li
@zhuohan123
Sep 22, 2023
We are excited to announce the first vLLM Bay Area meetup at 6pm on 10/5 (Thu)! Please find the event details and RSVP at: lu.ma/first-vllm-mee…. The vLLM team will give a deep dive of vLLM and show the future roadmap. We will also have vLLM users and contributors share their
The First vLLM Meetup · Luma
From luma.com
13K
Zhuohan Li
@zhuohan123
Apr 11, 2025
Replying to @jxmnop
Kaichao is a legend. My single biggest contribution to @vllm_project was convincing him it's a good idea to take a look and work on vLLM during his Berkeley visit. The project wouldn't be where it is today without him.
22K
Zhuohan Li
@zhuohan123
Nov 17, 2023
Excited to see vLLM being the default inference engine for the Microsoft Azure AI model catalog! > Our default choice for serving models is vLLM, which provides high throughput and efficient memory management with continuous batching and Paged Attention. Learn more in the blog
11K
Zhuohan Li
@zhuohan123
Aug 30, 2023
Deeply honored to be the first cohort of the program and a big shout-out to @a16z for setting up the grant and recognizing vLLM! Let's go, open source!
Matt Bornstein
@BornsteinMatt
Aug 30, 2023
[New program] a16z Open Source AI Grants Hackers & independent devs are massively important to the AI ecosystem. We're starting a grant funding program so they can continue their work without pressure to generate financial returns. a16z.com/2023/08/30/sup…
13K
Zhuohan Li
@zhuohan123
Dec 14, 2024
I’m not even angry. Just feel funny and confused about why would anyone make such a statement at a prestigious ML conference where a large portion of the publications and participants are Chinese. @NeurIPSConf is great and there are many talks and conversations that I enjoy
Jiao Sun
@sunjiao123sun_
Dec 14, 2024
Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
6.1K
Zhuohan Li
@zhuohan123
Jun 20, 2023
Replying to @zhuohan123
🔥 The core of vLLM is PagedAttention, a novel attention algorithm that brings the classic idea of paging in OS’s virtual memory to LLM serving. Without modifying the model, PagedAttention can batch 5x more sequences together, increasing GPU utilization and thus the throughput.
GIF
7K
Zhuohan Li
@zhuohan123
Dec 11, 2023
Excited to have first-hand official support of the Mixtral MoE model in vLLM from @MistralAI! Getting started with Mixtral with the latest vLLM now: github.com/vllm-project/v…. Be sure to check their announcing blog: mistral.ai/news/mixtral-o… Joint with @woosuk_k @PierreStock
Guillaume Lample @ NeurIPS 2024
@GuillaumeLample
Dec 11, 2023
Very excited to release our second model, Mixtral 8x7B, an open weight mixture of experts model. Mixtral matches or outperforms Llama 2 70B and GPT3.5 on most benchmarks, and has the inference speed of a 12B dense model. It supports a context length of 32k tokens. (1/n)
GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for...
From github.com
18K