Stas Bekman (@StasBekman) / X

Stas Bekman

2,747 posts

Stas Bekman

@StasBekman

Toolmaker. Software creator, optimizer and harmonizer. Makes ML systems work and fly @ Snowflake.

Nanaimo, BC, Canada

stasosphere.com/machine-learni…

Joined November 2018

Stas Bekman
@StasBekman
Nov 21, 2023
This is a long overdue section of the ML Engineering Understanding Training Loss Patterns github.com/stas00/ml-engi… I warn you that the "Understanding" part is overloaded here since most of the time we don't really understand why certain types of spikes happen. Here
172K
Stas Bekman
@StasBekman
Oct 25, 2025
PyTorch announced Monarch which is meant to simplify distributed programming—your code looks and feels like a single-machine Python program, but can scale across thousands of GPUs. You can directly use Pythonic constructs—classes, functions, loops, tasks, futures—to express
94K
Stas Bekman
@StasBekman
Jan 3, 2025
The @PyTorch team are working on a new super important tool: github.com/pytorch-labs/t… This repository implements techniques for doing a per-step fault tolerance so you can keep training if errors occur without interrupting the entire training job. Some big companies already have
GitHub - meta-pytorch/torchft: Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming...
From github.com
46K
Stas Bekman
@StasBekman
Mar 19, 2024
Holy! The Machine Learning Engineering Open Book repo has just crossed 9k stars on github! That's insane as I have started writing it ~6 month ago! github.com/stas00/ml-engi… Thank you so much for your vote of confidence! It's super encouraging to continue investing into this
51K
Stas Bekman
@StasBekman
Oct 6, 2024
A special moment for The Machine Learning Engineering Open Book - it has just hit the magical 11111 stars and 666 forks! And it has been 1 year since I started on this structured brain dump! github.com/stas00/ml-engi… A huge thank you, the readers, for your vote of confidence!
50K
Stas Bekman
@StasBekman
Dec 3, 2023
This is the first pass on the new chapter for ML Engineering: The AI Battlefield Engineering - What You Need To Know github.com/stas00/ml-engi… This a WIP and your feedback for improvement is always welcome.
182K
Stas Bekman
@StasBekman
Feb 12, 2024
As the Machine Learning Engineering book was getting too unstructured I did a massive re-org and I present to you the new layout which hopefully is much more intuitive. github.com/stas00/ml-engi… The re-org work isn't 100% completed but it's mostly there. If you feel something is
57K
Stas Bekman
@StasBekman
Jan 1, 2022
I finally installed filters to remove *.medium.com, towardsdatascience\.com from Google Search. Pay to read this is no more! Yay! Why would one write for a for-profit company for **free** when their target audience is forced to pay to read it? Using: addons.mozilla.org/en-CA/firefox/…
Stas Bekman
@StasBekman
Nov 14, 2023
If this is useful for your work, I have just created a <1MB tiny random llama2 model including a tiny 3k tokenizer. huggingface.co/stas/tiny-rand… This is crucial for extremely fast testing/development. You can easily adapt the tiny model maker script to any other model
stas/tiny-random-llama-2 · Hugging Face
From huggingface.co
106K
Stas Bekman
@StasBekman
Nov 8, 2025
This is fantastic article explaining why you should be paying attention to the emergence of hybrid models and why they are likely to replace self-attention-based models (hint: much faster and lower memory foot print inference). pytorch.org/blog/hybrid-mo… This is from vllm folks.
51K
Stas Bekman
@StasBekman
Sep 30, 2024
This is a pretty awesome simple step-by-step guide showing you how to build your own PyTorch (a subset of ops supported) which requires just basic knowledge of C/C++/Python. towardsdatascience.com/recreating-pyt… The reason to walk through it is to better understand how some of the common
21K
Stas Bekman
@StasBekman
Oct 22, 2021
At @BigscienceW 104B GPT training we finally had a BREAKTHROUGH and the training doesn't diverge! The key change of Expirement 11 was to change --init-method-std 0.006 from 0.02 - Thank you, BS Team! Details are in: github.com/bigscience-wor… TB is here: huggingface.co/bigscience/tr8…
Stas Bekman
@StasBekman
Nov 9, 2023
The Model Parallelism chapter of the ML Engineering is now quite complete. github.com/stas00/ml-engi… The future of training LLM/VLMs is exciting with so many great minds putting their smarts into giving the ML community amazing tools to work with. I will now stop making too many
93K
Stas Bekman
@StasBekman
Dec 11, 2021
I have just added a brief summary of Transformers with Mixture of Experts architectures with pointers to papers and blog posts that you can study for more details. huggingface.co/docs/transform… the diagram is from the Google blog post linked in the summary.