Log inSign up
Woosuk Kwon
334 posts
user avatar
Woosuk Kwon
@woosuk_k
@inferact | @vllm_project | prev: PhD @Berkeley_EECS
woosuk.me
Joined April 2023
782
Following
8,172
Followers
  • Pinned
    user avatar
    Woosuk Kwon
    @woosuk_k
    Jan 22
    Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper
    Image
    484K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Sep 5, 2024
    Developing @vllm_project taught me a tough lesson: to keep the GPU fully utilized, we need to pay close attention to everything happening on the CPU. Over the past month, the vLLM community conducted an in-depth study and made key optimizations, leading to significant
    user avatar
    vLLM
    @vllm_project
    Sep 5, 2024
    A month ago, we announced our performance roadmap. Today, we are happy to share that the latest release achieves 🚀2.7x higher throughput and is 5x faster for output latency on Llama 8B, and 1.8x higher throughput and 2x faster on Llama 70B for H100s. blog.vllm.ai/2024/09/05/per…
    37K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Nov 14, 2023
    We’ve just released a new blog post comparing vLLM with DeepSpeed-FastGen. While we are happy to see the open-source technology advancements from the DeepSpeed team, we’ve got different results with more extensive performance benchmarks. vLLM is actually faster than DeepSpeed in
    Image
    45K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Jan 27, 2025
    As one of the fastest-growing OSS projects, vLLM inevitably accumulated some technical debts. We noticed it, and re-architected vLLM's core with careful engineering. Enjoy simpler code & higher performance with vLLM V1!
    user avatar
    vLLM
    @vllm_project
    Jan 27, 2025
    🚀 With the v0.7.0 release today, we are excited to announce the alpha release of vLLM V1: A major architectural upgrade with 1.7x speedup! Clean code, optimized execution loop, zero-overhead prefix caching, enhanced multimodal support, and more.
    Image
    31K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Sep 13, 2023
    Exciting news! 🎉Our PagedAttention paper is now up on arXiv! Dive in to learn why it's an indispensable technique for all major LLM serving frameworks. @zhuohan123 and I will present it at @sospconf next month. Blog post: vllm.ai Paper: arxiv.org/abs/2309.06180
    26K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Aug 19, 2024
    This Wednesday (8/21) I will be speaking about the diverse hardware support in vLLM, with a focus on AMD GPUs and Google TPUs. Sign up to learn more about vLLM! neuralmagic.com/community-offi…
    Image
    14K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Mar 18, 2025
    vLLM ❤️ @nvidia Dynamo
    user avatar
    vLLM
    @vllm_project
    Mar 18, 2025
    Replying to @vllm_project
    We are grateful for the trust in vLLM ❤️
    Image
    9.8K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Feb 21, 2025
    Let's make B200 go brrr 🚀 Huge thanks @nvidia for supporting us!
    user avatar
    vLLM
    @vllm_project
    Feb 21, 2025
    We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you @nvidia!
    Image
    9.5K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Jun 27, 2024
    Gemma 2 is also available in vLLM! 🎉github.com/vllm-project/v… Check out the update in the main branch and stay tuned for the next release coming soon
    user avatar
    Google DeepMind
    @GoogleDeepMind
    Jun 27, 2024
    We're excited to unveil Gemma 2. 🛠️ Available in both 9B and 27B parameters, it delivers the best performance for its size - unlocking more possibilities for developers to build and deploy with AI. → dpmd.ai/45Q6yba
    Image
    00:00
    Image
    [Model] Add Gemma 2 by WoosukKwon · Pull Request #5908 · vllm-project/vllm
    From github.com
    15K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Apr 11, 2025
    Replying to @jxmnop
    He’s a legend. Never seen anyone so focused, productive, and kind🔥
    26K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Dec 18, 2023
    In vLLM v0.2.6, we've introduced CUDA/HIP graph for faster model execution, and added GPTQ support (finally!). More optimizations and feature are coming... so stay tuned!
    Image
    Release v0.2.6 · vllm-project/vllm
    From github.com
    10K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Dec 15, 2023
    vLLM + AMD MI300X = Blazingly-fast LLM serving! 🚀🚀🚀
    user avatar
    AMD
    @AMD
    Dec 15, 2023
    Update: Let's look at some new inference performance data on AMD Instinct MI300X community.amd.com/t5/instinct-ac…
    13K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Mar 19, 2025
    We are super excited to host an inference night with @ollama next Thursday! See you all there!!
    user avatar
    ollama
    @ollama
    Mar 19, 2025
    .@vllm_project and Ollama are hosting an inference night at @ycombinator San Francisco. ❤️ Let's go open source! Come meet: vLLM project leads (@simon_mo_ and @woosuk_k) Ollama maintainers startup founders / engineers RSVP required 👇👇👇
    vLLM and Ollama driving fast to serve you.
    10K
  • user avatar
    Woosuk Kwon
    @woosuk_k
    Dec 11, 2023
    Check out the Mistral's official inference code at vLLM! github.com/vllm-project/v…
    user avatar
    Zhuohan Li
    @zhuohan123
    Dec 11, 2023
    Excited to have first-hand official support of the Mixtral MoE model in vLLM from @MistralAI! Getting started with Mixtral with the latest vLLM now: github.com/vllm-project/v…. Be sure to check their announcing blog: mistral.ai/news/mixtral-o… Joint with @woosuk_k @PierreStock
    8.6K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement