Log inSign up
Zhuohan Li
247 posts
Image
user avatar
Zhuohan Li
@zhuohan123
building @vllm_project at @meta | ex-openai | cs phd @ 🌁 uc berkeley | machine learning system | the real agi is the friends we made along the way
San Francisco Bay Area
zhuohan.li
Joined January 2011
937
Following
10.5K
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • user avatar
    Zhuohan Li
    @zhuohan123
    Aug 5, 2025
    I’ve been fortunate to lead the infra and inference work that brings gpt-oss to life. A year ago, I joined OpenAI after building vLLM from scratch — It’s deeply meaningful to now be on the other side of the release, helping share models back with the open-source community.
    user avatar
    OpenAI
    @OpenAI
    Aug 5, 2025
    Our open models are here. Both of them. openai.com/open-models
    416K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Jun 20, 2023
    🌟 Thrilled to introduce vLLM with @woosuk_k! 🚀 vLLM is an open-source LLM inference and serving library that accelerates HuggingFace Transformers by 24x and powers @lmsysorg Vicuna and Chatbot Arena. Github: github.com/vllm-project/v… Blog: vllm.ai
    Image
    GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for...
    From github.com
    398K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Aug 5, 2025
    Replying to @zhuohan123
    And… the biggest lesson I learned along the way? huggingface-cli uploads to a public repo by default — so add --private if you’re not ready to show your work to the world just yet. 😅
    12K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Apr 19, 2025
    Please stop attacking vLLM team’s integrity and being defensive about benchmarks involving SGLang. If you think the results are incorrect, please submit a PR to the open benchmark repo to correct the results.
    51K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Aug 5, 2025
    Launching this model together with the amazing @vllm_project team was a real highlight for me! Follow this guide to launch gpt-oss in vLLM:
    user avatar
    OpenAI
    @OpenAI
    Aug 5, 2025
    Our open models are here. Both of them. openai.com/open-models
    Image
    vLLM Now Supports gpt-oss
    From vllm.ai
    15K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Aug 5, 2025
    😎😎😎😎 Check out this cool demo 👇👇👇👇 presented by the me and the awesome @DKundel!!
    user avatar
    OpenAI
    @OpenAI
    Aug 5, 2025
    Want to see our open models in action? Watch how gpt-oss builds a video game—using tools step-by-step within chain-of-thought reasoning 👾🍓
    Image
    00:00
    38K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Feb 24, 2023
    Unlock the full potential of model parallelism with AlpaServe 🚀: Besides scaling models beyond one GPU, our new paper shows that model parallelism can process NN serving requests 10x faster even if the models fit into 1 GPU! Paper: arxiv.org/abs/2302.11665 👇 [1/8]
    Image
    21K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Sep 22, 2023
    We are excited to announce the first vLLM Bay Area meetup at 6pm on 10/5 (Thu)! Please find the event details and RSVP at: lu.ma/first-vllm-mee…. The vLLM team will give a deep dive of vLLM and show the future roadmap. We will also have vLLM users and contributors share their
    Image
    The First vLLM Meetup · Luma
    From luma.com
    13K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Apr 11, 2025
    Replying to @jxmnop
    Kaichao is a legend. My single biggest contribution to @vllm_project was convincing him it's a good idea to take a look and work on vLLM during his Berkeley visit. The project wouldn't be where it is today without him.
    22K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Nov 17, 2023
    Excited to see vLLM being the default inference engine for the Microsoft Azure AI model catalog! > Our default choice for serving models is vLLM, which provides high throughput and efficient memory management with continuous batching and Paged Attention. Learn more in the blog
    11K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Aug 30, 2023
    Deeply honored to be the first cohort of the program and a big shout-out to @a16z for setting up the grant and recognizing vLLM! Let's go, open source!
    user avatar
    Matt Bornstein
    a16z
    @BornsteinMatt
    Aug 30, 2023
    [New program] a16z Open Source AI Grants Hackers & independent devs are massively important to the AI ecosystem. We're starting a grant funding program so they can continue their work without pressure to generate financial returns. a16z.com/2023/08/30/sup…
    13K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Dec 14, 2024
    I’m not even angry. Just feel funny and confused about why would anyone make such a statement at a prestigious ML conference where a large portion of the publications and participants are Chinese. @NeurIPSConf is great and there are many talks and conversations that I enjoy
    user avatar
    Jiao Sun
    @sunjiao123sun_
    Dec 14, 2024
    Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
    Image
    6.1K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Jun 20, 2023
    Replying to @zhuohan123
    🔥 The core of vLLM is PagedAttention, a novel attention algorithm that brings the classic idea of paging in OS’s virtual memory to LLM serving. Without modifying the model, PagedAttention can batch 5x more sequences together, increasing GPU utilization and thus the throughput.
    Image
    GIF
    7K
  • user avatar
    Zhuohan Li
    @zhuohan123
    Dec 11, 2023
    Excited to have first-hand official support of the Mixtral MoE model in vLLM from @MistralAI! Getting started with Mixtral with the latest vLLM now: github.com/vllm-project/v…. Be sure to check their announcing blog: mistral.ai/news/mixtral-o… Joint with @woosuk_k @PierreStock
    user avatar
    Guillaume Lample @ NeurIPS 2024
    @GuillaumeLample
    Dec 11, 2023
    Very excited to release our second model, Mixtral 8x7B, an open weight mixture of experts model. Mixtral matches or outperforms Llama 2 70B and GPT3.5 on most benchmarks, and has the inference speed of a 12B dense model. It supports a context length of 32k tokens. (1/n)
    Image
    Image
    Image
    Image
    GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for...
    From github.com
    18K
This post is unavailable.
Advertisement
Advertisement