Log inSign up
Leandro von Werra
2,196 posts
Image
user avatar
Leandro von Werra
@lvwerra
Head of research @huggingface
Bern, Switzerland
lvwerra.com
Joined March 2019
453
Following
12K
Followers
  • Pinned
    user avatar
    Leandro von Werra
    @lvwerra
    Feb 19, 2025
    The Ultra-Scale Playbook: Training LLMs on GPU Clusters Learn how to train your own DeepSeek-V3 model using 5D parallelism, ZeRO, fast kernels, compute/comm overlap and bottlenecks with theory, interactive plots and 4000+ scaling experiments and audio! huggingface.co/spaces/nanotro…
    Image
    46K
  • user avatar
    Leandro von Werra
    @lvwerra
    Feb 12, 2025
    73k GitHub stars for a PDF and a Readme
    Image
    232K
  • user avatar
    Leandro von Werra
    @lvwerra
    Dec 19, 2024
    Jupyter Agents - LLMs running data analysis directly in a notebook! The agent can load data, execute code, plot results and following your guidance and ideas! A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!
    Image
    00:00
    200K
  • user avatar
    Leandro von Werra
    @lvwerra
    May 31, 2022
    Evaluation is one of the most important aspects of ML but today’s evaluation landscape is scattered and undocumented which makes evaluation unnecessarily hard. For that reason we are excited to release 🤗 Evaluate! github.com/huggingface/ev… Let’s take a tour:
    Image
  • user avatar
    Leandro von Werra
    @lvwerra
    Mar 11, 2025
    Introducing: ⚡️OlympicCoder⚡️ Beats Claude 3.7 and is close to o1-mini/R1 on olympiad level coding with just 7B parameters! Let that sink in! Read more about its training dataset, the new IOI benchmark, and more in Open-R1 progress report #3.
    Image
    159K
  • user avatar
    Leandro von Werra
    @lvwerra
    Jul 19, 2023
    Did you know that you can train all Llama-2 models on your own data in just a few lines? The script even works with the 70B model on a single A100 GPU thanks to the magic of 4bit and and PEFT! Learn more: huggingface.co/docs/trl/main/… Full script: github.com/lvwerra/trl/bl…
    Image
    192K
  • user avatar
    Leandro von Werra
    @lvwerra
    Jan 6, 2025
    Distributed training is notoriously hard to learn - knowledge is scattered across papers and complex codebases. Enter picotron: implementing all 4D parallelism concepts in separate, readable files totaling just 1988 LoC!
    Image
    75K
  • user avatar
    Leandro von Werra
    @lvwerra
    Jan 3, 2022
    Our book "Natural Language Processing with Transformers: Building Language Applications with Hugging Face" can now be preordered! amazon.de/Natural-Langua… This thread gives an overview of what you can expect by summarizing the content of each chapter:
    Image
  • user avatar
    Leandro von Werra
    @lvwerra
    Apr 6, 2023
    Excited to introduce: StackLlama🦙 An end-to-end tutorial for training Llama with RLHF on preference data such as the StackExchange questions! Blog: hf.co/blog/stackllama Demo: hf.co/spaces/trl-lib… Code: github.com/lvwerra/trl/tr… The resulting model is surprisingly fun!🧵
    Image
    157K
  • user avatar
    Leandro von Werra
    @lvwerra
    Feb 15, 2022
    It finally arrived! 🎉 So I guess it is a real thing now. Thanks to everybody who ordered it. Because of all of you it is the #1 release on Amazon in NLP, #3 in ML&AI, and #4 in all of computer science! ❤️ transformersbook.com
    Image
  • user avatar
    Leandro von Werra
    @lvwerra
    Aug 19, 2025
    Excited to release: Jupyter Agent 2 The agent can load data, execute code, plot results inside Jupyter faster than you can scroll! 🤖 Powered by Qwen3-Coder ⚡️ Running on Cerebras ⚙️ Executed in E2B ↕️ Upload your files All videos are in *real time*! hf.co/spaces/lvwerra…
    Image
    00:00
    66K
  • user avatar
    Leandro von Werra
    @lvwerra
    Dec 6, 2021
    Can we create all the code for training GitHub CoPilot in a (looong) tweet thread? Yes, see how to train CodeParrot🦜, a large GPT-2 model for code, from scratch in this thread! Ready - go!
    Image
  • user avatar
    Leandro von Werra
    @lvwerra
    Jan 23, 2022
    How do models like GPT-2 and BERT represent position of tokens? When visualizing their positional encodings I found an interesting pattern. A short thread:
    Image
    Image
  • user avatar
    Leandro von Werra
    @lvwerra
    Oct 3, 2024
    solving problems using BERT that can be solved by a RegEx is another level of skill issue
    user avatar
    merve
    @mervenoyann
    Oct 2, 2024
    solving problems using LLMs that can be solved by fine-tuning BERT is a skill issue
    48K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement