Woosuk Kwon (@woosuk

Woosuk Kwon

334 posts

Woosuk Kwon

@woosuk_k

@inferact | @vllm_project | prev: PhD @Berkeley_EECS

Joined April 2023

Pinned
Woosuk Kwon
@woosuk_k
Jan 22
Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper
484K
Woosuk Kwon
@woosuk_k
Sep 5, 2024
Developing @vllm_project taught me a tough lesson: to keep the GPU fully utilized, we need to pay close attention to everything happening on the CPU. Over the past month, the vLLM community conducted an in-depth study and made key optimizations, leading to significant
vLLM
@vllm_project
Sep 5, 2024
A month ago, we announced our performance roadmap. Today, we are happy to share that the latest release achieves 🚀2.7x higher throughput and is 5x faster for output latency on Llama 8B, and 1.8x higher throughput and 2x faster on Llama 70B for H100s. blog.vllm.ai/2024/09/05/per…
37K
Woosuk Kwon
@woosuk_k
Nov 14, 2023
We’ve just released a new blog post comparing vLLM with DeepSpeed-FastGen. While we are happy to see the open-source technology advancements from the DeepSpeed team, we’ve got different results with more extensive performance benchmarks. vLLM is actually faster than DeepSpeed in
45K
Woosuk Kwon
@woosuk_k
Jan 27, 2025
As one of the fastest-growing OSS projects, vLLM inevitably accumulated some technical debts. We noticed it, and re-architected vLLM's core with careful engineering. Enjoy simpler code & higher performance with vLLM V1!
vLLM
@vllm_project
Jan 27, 2025
🚀 With the v0.7.0 release today, we are excited to announce the alpha release of vLLM V1: A major architectural upgrade with 1.7x speedup! Clean code, optimized execution loop, zero-overhead prefix caching, enhanced multimodal support, and more.
31K
Woosuk Kwon
@woosuk_k
Sep 13, 2023
Exciting news! 🎉Our PagedAttention paper is now up on arXiv! Dive in to learn why it's an indispensable technique for all major LLM serving frameworks. @zhuohan123 and I will present it at @sospconf next month. Blog post: vllm.ai Paper: arxiv.org/abs/2309.06180
26K
Woosuk Kwon
@woosuk_k
Aug 19, 2024
This Wednesday (8/21) I will be speaking about the diverse hardware support in vLLM, with a focus on AMD GPUs and Google TPUs. Sign up to learn more about vLLM! neuralmagic.com/community-offi…
14K
Woosuk Kwon
@woosuk_k
Mar 18, 2025
vLLM ❤️ @nvidia Dynamo
vLLM
@vllm_project
Mar 18, 2025
Replying to @vllm_project
We are grateful for the trust in vLLM ❤️
9.8K
Woosuk Kwon
@woosuk_k
Feb 21, 2025
Let's make B200 go brrr 🚀 Huge thanks @nvidia for supporting us!
vLLM
@vllm_project
Feb 21, 2025
We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you @nvidia!
9.5K
Woosuk Kwon
@woosuk_k
Jun 27, 2024
Gemma 2 is also available in vLLM! 🎉github.com/vllm-project/v… Check out the update in the main branch and stay tuned for the next release coming soon
Google DeepMind
@GoogleDeepMind
Jun 27, 2024
We're excited to unveil Gemma 2. 🛠️ Available in both 9B and 27B parameters, it delivers the best performance for its size - unlocking more possibilities for developers to build and deploy with AI. → dpmd.ai/45Q6yba
00:00
[Model] Add Gemma 2 by WoosukKwon · Pull Request #5908 · vllm-project/vllm
From github.com
15K
Woosuk Kwon
@woosuk_k
Apr 11, 2025
Replying to @jxmnop
He’s a legend. Never seen anyone so focused, productive, and kind🔥
26K
Woosuk Kwon
@woosuk_k
Dec 18, 2023
In vLLM v0.2.6, we've introduced CUDA/HIP graph for faster model execution, and added GPTQ support (finally!). More optimizations and feature are coming... so stay tuned!
Release v0.2.6 · vllm-project/vllm
From github.com
10K
Woosuk Kwon
@woosuk_k
Dec 15, 2023
vLLM + AMD MI300X = Blazingly-fast LLM serving! 🚀🚀🚀
AMD
@AMD
Dec 15, 2023
Update: Let's look at some new inference performance data on AMD Instinct MI300X community.amd.com/t5/instinct-ac…
13K
Woosuk Kwon
@woosuk_k
Mar 19, 2025
We are super excited to host an inference night with @ollama next Thursday! See you all there!!
ollama
@ollama
Mar 19, 2025
.@vllm_project and Ollama are hosting an inference night at @ycombinator San Francisco. ❤️ Let's go open source! Come meet: vLLM project leads (@simon_mo_ and @woosuk_k) Ollama maintainers startup founders / engineers RSVP required 👇👇👇
10K
Woosuk Kwon
@woosuk_k
Dec 11, 2023
Check out the Mistral's official inference code at vLLM! github.com/vllm-project/v…
Zhuohan Li
@zhuohan123
Dec 11, 2023
Excited to have first-hand official support of the Mixtral MoE model in vLLM from @MistralAI! Getting started with Mixtral with the latest vLLM now: github.com/vllm-project/v…. Be sure to check their announcing blog: mistral.ai/news/mixtral-o… Joint with @woosuk_k @PierreStock
8.6K