Leandro von Werra (@lvwerra) / X

Leandro von Werra

2,196 posts

Leandro von Werra

@lvwerra

Head of research @huggingface

Bern, Switzerland

Joined March 2019

Pinned
Leandro von Werra
@lvwerra
Feb 19, 2025
The Ultra-Scale Playbook: Training LLMs on GPU Clusters Learn how to train your own DeepSeek-V3 model using 5D parallelism, ZeRO, fast kernels, compute/comm overlap and bottlenecks with theory, interactive plots and 4000+ scaling experiments and audio! huggingface.co/spaces/nanotro…
46K
Leandro von Werra
@lvwerra
Feb 12, 2025
73k GitHub stars for a PDF and a Readme
232K
Leandro von Werra
@lvwerra
Dec 19, 2024
Jupyter Agents - LLMs running data analysis directly in a notebook! The agent can load data, execute code, plot results and following your guidance and ideas! A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!
00:00
200K
Leandro von Werra
@lvwerra
May 31, 2022
Evaluation is one of the most important aspects of ML but today’s evaluation landscape is scattered and undocumented which makes evaluation unnecessarily hard. For that reason we are excited to release 🤗 Evaluate! github.com/huggingface/ev… Let’s take a tour:
Leandro von Werra
@lvwerra
Mar 11, 2025
Introducing: ⚡️OlympicCoder⚡️ Beats Claude 3.7 and is close to o1-mini/R1 on olympiad level coding with just 7B parameters! Let that sink in! Read more about its training dataset, the new IOI benchmark, and more in Open-R1 progress report #3.
159K
Leandro von Werra
@lvwerra
Jul 19, 2023
Did you know that you can train all Llama-2 models on your own data in just a few lines? The script even works with the 70B model on a single A100 GPU thanks to the magic of 4bit and and PEFT! Learn more: huggingface.co/docs/trl/main/… Full script: github.com/lvwerra/trl/bl…
192K
Leandro von Werra
@lvwerra
Jan 6, 2025
Distributed training is notoriously hard to learn - knowledge is scattered across papers and complex codebases. Enter picotron: implementing all 4D parallelism concepts in separate, readable files totaling just 1988 LoC!
75K
Leandro von Werra
@lvwerra
Jan 3, 2022
Our book "Natural Language Processing with Transformers: Building Language Applications with Hugging Face" can now be preordered! amazon.de/Natural-Langua… This thread gives an overview of what you can expect by summarizing the content of each chapter:
Leandro von Werra
@lvwerra
Apr 6, 2023
Excited to introduce: StackLlama🦙 An end-to-end tutorial for training Llama with RLHF on preference data such as the StackExchange questions! Blog: hf.co/blog/stackllama Demo: hf.co/spaces/trl-lib… Code: github.com/lvwerra/trl/tr… The resulting model is surprisingly fun!🧵
157K
Leandro von Werra
@lvwerra
Feb 15, 2022
It finally arrived! 🎉 So I guess it is a real thing now. Thanks to everybody who ordered it. Because of all of you it is the #1 release on Amazon in NLP, #3 in ML&AI, and #4 in all of computer science! ❤️ transformersbook.com
Leandro von Werra
@lvwerra
Aug 19, 2025
Excited to release: Jupyter Agent 2 The agent can load data, execute code, plot results inside Jupyter faster than you can scroll! 🤖 Powered by Qwen3-Coder ⚡️ Running on Cerebras ⚙️ Executed in E2B ↕️ Upload your files All videos are in *real time*! hf.co/spaces/lvwerra…
00:00
66K
Leandro von Werra
@lvwerra
Dec 6, 2021
Can we create all the code for training GitHub CoPilot in a (looong) tweet thread? Yes, see how to train CodeParrot🦜, a large GPT-2 model for code, from scratch in this thread! Ready - go!
Leandro von Werra
@lvwerra
Jan 23, 2022
How do models like GPT-2 and BERT represent position of tokens? When visualizing their positional encodings I found an interesting pattern. A short thread:
Leandro von Werra
@lvwerra
Oct 3, 2024
solving problems using BERT that can be solved by a RegEx is another level of skill issue
merve
@mervenoyann
Oct 2, 2024
solving problems using LLMs that can be solved by fine-tuning BERT is a skill issue
48K