Log inSign up
Simon Mo
263 posts
user avatar
Simon Mo
@simon_mo_
building @inferact for @vllm_project
Joined July 2018
358
Following
3,114
Followers
  • Pinned
    user avatar
    Simon Mo
    @simon_mo_
    Jan 22
    vLLM has grown to 2000+ contributors scale with a diverse community of model, hardwares, and applications. I see @vllm_project on the path of becoming the world's inference engine and @inferact to accelerate AI progress. We cannot be more excited about the road ahead.
    user avatar
    Woosuk Kwon
    @woosuk_k
    Jan 22
    Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper
    Image
    16K
  • user avatar
    Simon Mo
    @simon_mo_
    Mar 1, 2024
    vLLM v0.3.3 is released with Starcoder2 @BigCodeProject and Inferentia @awscloud support. I'm also excited about the addition of guided decoding* (JSON, regex) in server leveraging @OutlinesOSS. *experimental, the schema take some time to compile but will be cached.
    Image
    15K
  • user avatar
    Simon Mo
    @simon_mo_
    Apr 19, 2025
    Some personal lessons learned on latest benchmark in @vllm_project: 🧵
    user avatar
    vLLM
    @vllm_project
    Apr 19, 2025
    After feedback about our v0.8.4 benchmark for @deepseek_ai R1, we rerun it with suggested changes: vLLM no EP, SGLang updated v0.4.5 -> post1 and EP -> DP, TensorRT-LLM uses overlap scheduler and tuned parameters. We are seeing good results! So why was there a difference? 🧵
    Image
    13K
  • user avatar
    Simon Mo
    @simon_mo_
    Feb 21, 2024
    vLLM v0.3.2 is released with support for OLMo and Gemma! github.com/vllm-project/v…
    Image
    4.5K
  • user avatar
    Simon Mo
    @simon_mo_
    Apr 11, 2025
    Replying to @jxmnop
    In early 2024, @KaichaoYou spent almost 3 months debugging an impossible memory problem in vLLM that turns out bad interaction of nccl and cuda graph, which immediately benefited way beyond just @vllm_project users. Set NCCL_CUMEM_ENABLE=0 people.
    Image
    [Core][Optimization] remove vllm-nccl by youkaichao · Pull Request #5091 · vllm-project/vllm
    From github.com
    8.1K
  • user avatar
    Simon Mo
    @simon_mo_
    Jan 15, 2024
    We are hosting The Second vLLM Meetup in downtown SF on Jan 31st (Wed). Come to chat with vLLM maintainers about LLMs in production and inference optimizations! Thanks @IBM for hosting us.
    Image
    The Second vLLM Meetup @ The AI Alliance · Luma
    From luma.com
    9.2K
  • user avatar
    Simon Mo
    @simon_mo_
    Sep 5, 2024
    One thing I'm continuously impressed and surprised by the the power of the vLLM community. After adding so many amazing features (pipeline parallel, state of the art quantization, VLM, etc), the developer community get together for a performance sprint!
    Image
    vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction
    From vllm.ai
    2.3K
  • user avatar
    Simon Mo
    @simon_mo_
    Mar 15, 2024
    The vLLM Team is excited to announce our Third vLLM Meetup in San Carlos on April 2nd (Tuesday). We will be discussing feature updates and hear from you! We thank @Roblox for hosting the event! robloxandvllmmeetup2024.splashthat.com
    8.5K
  • user avatar
    Simon Mo
    @simon_mo_
    Aug 9, 2024
    Thank you @nvidia! The future of LLM inference is open!
    user avatar
    vLLM
    @vllm_project
    Aug 8, 2024
    🙏 Thank you @nvidia for sponsoring vLLM development. The DGX H200 machine is marvelous! We plan to use the machine for benchmarking and performance enhancement 🏎️.
    Image
    1.6K
  • user avatar
    Simon Mo
    @simon_mo_
    Dec 12, 2023
    They are literally only charging for the electricity. $0.0006/1K tokens for 100 tok/s gives $0.00006/s of compute. This is $0.216/hr. This is basically the cost electricity (700W H100 + 300W CPU)*$0.25/kWh = $0.25/hr Cheapest H100x1 prices is about $2/hr. A100x1 is at $1.5/hr
    user avatar
    Susan Zhang
    @suchenzang
    Dec 12, 2023
    grateful for the VCs who subsidize this entire market and lower all barriers of access 🙌
    3.7K
  • user avatar
    Simon Mo
    @simon_mo_
    Jan 28, 2025
    Our biggest milestone yet! I'm particularly excited how the vLLM contributor community organized from many organization to deliver a high quality V1 engine core. We are just getting started 🚀
    user avatar
    vLLM
    @vllm_project
    Jan 27, 2025
    🚀 With the v0.7.0 release today, we are excited to announce the alpha release of vLLM V1: A major architectural upgrade with 1.7x speedup! Clean code, optimized execution loop, zero-overhead prefix caching, enhanced multimodal support, and more.
    Image
    1.1K
  • user avatar
    Simon Mo
    @simon_mo_
    Jul 25, 2024
    "In the recent Meta Llama 3.1 announcement, 8 out of 10 official partners for real time inference run vLLM as the serving engine for the Llama 3.1 models." Guess the remaining two 😉
    user avatar
    vLLM
    @vllm_project
    Jul 25, 2024
    Two exciting updates! * vLLM is already widely adopted, and we want to ensure it has open governance and longevity. We are starting to join @LFAIDataFdn! * We are doubling down in performance. Please checkout our roadmap. blog.vllm.ai/2024/07/25/lfa…
    3.3K
  • user avatar
    Simon Mo
    @simon_mo_
    Jul 23, 2024
    Super excited about Llama 3.1! The license now allows synthetic data generation and distillation, which will unlocks incredible innovations in the open source community.
    user avatar
    vLLM
    @vllm_project
    Jul 23, 2024
    🚀 Exciting news! In partnership with @AIatMeta, vLLM officially supports Llama 3.1! 🦙✨ For Llama 3.1 405B, vLLM supports FP8 quantization on single machine and pipeline parallelism for multi-node serving. Learn more in our latest blog post: blog.vllm.ai/2024/07/23/lla…
    1.3K
  • user avatar
    Simon Mo
    @simon_mo_
    May 21, 2024
    I had a lot of fun. So many great questions about @vllm_project from all the experts and hackers! Congrats to everyone who participated!
    user avatar
    Runpod
    @runpod
    May 21, 2024
    Replying to @runpod
    Image
    Image
    Image
    Image
    709

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement