Log inSign up
Stas Bekman
2,747 posts
user avatar
Stas Bekman
@StasBekman
Toolmaker. Software creator, optimizer and harmonizer. Makes ML systems work and fly @ Snowflake.
Nanaimo, BC, Canada
stasosphere.com/machine-learni…
Joined November 2018
306
Following
10.2K
Followers
  • user avatar
    Stas Bekman
    @StasBekman
    Nov 21, 2023
    This is a long overdue section of the ML Engineering Understanding Training Loss Patterns github.com/stas00/ml-engi… I warn you that the "Understanding" part is overloaded here since most of the time we don't really understand why certain types of spikes happen. Here
    Image
    172K
  • user avatar
    Stas Bekman
    @StasBekman
    Oct 25, 2025
    PyTorch announced Monarch which is meant to simplify distributed programming—your code looks and feels like a single-machine Python program, but can scale across thousands of GPUs. You can directly use Pythonic constructs—classes, functions, loops, tasks, futures—to express
    94K
  • user avatar
    Stas Bekman
    @StasBekman
    Jan 3, 2025
    The @PyTorch team are working on a new super important tool: github.com/pytorch-labs/t… This repository implements techniques for doing a per-step fault tolerance so you can keep training if errors occur without interrupting the entire training job. Some big companies already have
    Image
    GitHub - meta-pytorch/torchft: Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming...
    From github.com
    46K
  • user avatar
    Stas Bekman
    @StasBekman
    Mar 19, 2024
    Holy! The Machine Learning Engineering Open Book repo has just crossed 9k stars on github! That's insane as I have started writing it ~6 month ago! github.com/stas00/ml-engi… Thank you so much for your vote of confidence! It's super encouraging to continue investing into this
    Image
    51K
  • user avatar
    Stas Bekman
    @StasBekman
    Oct 6, 2024
    A special moment for The Machine Learning Engineering Open Book - it has just hit the magical 11111 stars and 666 forks! And it has been 1 year since I started on this structured brain dump! github.com/stas00/ml-engi… A huge thank you, the readers, for your vote of confidence!
    Image
    50K
  • user avatar
    Stas Bekman
    @StasBekman
    Dec 3, 2023
    This is the first pass on the new chapter for ML Engineering: The AI Battlefield Engineering - What You Need To Know github.com/stas00/ml-engi… This a WIP and your feedback for improvement is always welcome.
    Image
    182K
  • user avatar
    Stas Bekman
    @StasBekman
    Feb 12, 2024
    As the Machine Learning Engineering book was getting too unstructured I did a massive re-org and I present to you the new layout which hopefully is much more intuitive. github.com/stas00/ml-engi… The re-org work isn't 100% completed but it's mostly there. If you feel something is
    Image
    57K
  • user avatar
    Stas Bekman
    @StasBekman
    Jan 1, 2022
    I finally installed filters to remove *.medium.com, towardsdatascience\.com from Google Search. Pay to read this is no more! Yay! Why would one write for a for-profit company for **free** when their target audience is forced to pay to read it? Using: addons.mozilla.org/en-CA/firefox/…
  • user avatar
    Stas Bekman
    @StasBekman
    Nov 14, 2023
    If this is useful for your work, I have just created a <1MB tiny random llama2 model including a tiny 3k tokenizer. huggingface.co/stas/tiny-rand… This is crucial for extremely fast testing/development. You can easily adapt the tiny model maker script to any other model
    Image
    stas/tiny-random-llama-2 · Hugging Face
    From huggingface.co
    106K
  • user avatar
    Stas Bekman
    @StasBekman
    Nov 8, 2025
    This is fantastic article explaining why you should be paying attention to the emergence of hybrid models and why they are likely to replace self-attention-based models (hint: much faster and lower memory foot print inference). pytorch.org/blog/hybrid-mo… This is from vllm folks.
    51K
  • user avatar
    Stas Bekman
    @StasBekman
    Sep 30, 2024
    This is a pretty awesome simple step-by-step guide showing you how to build your own PyTorch (a subset of ops supported) which requires just basic knowledge of C/C++/Python. towardsdatascience.com/recreating-pyt… The reason to walk through it is to better understand how some of the common
    21K
  • user avatar
    Stas Bekman
    @StasBekman
    Oct 22, 2021
    At @BigscienceW 104B GPT training we finally had a BREAKTHROUGH and the training doesn't diverge! The key change of Expirement 11 was to change --init-method-std 0.006 from 0.02 - Thank you, BS Team! Details are in: github.com/bigscience-wor… TB is here: huggingface.co/bigscience/tr8…
    Image
  • user avatar
    Stas Bekman
    @StasBekman
    Nov 9, 2023
    The Model Parallelism chapter of the ML Engineering is now quite complete. github.com/stas00/ml-engi… The future of training LLM/VLMs is exciting with so many great minds putting their smarts into giving the ML community amazing tools to work with. I will now stop making too many
    Image
    93K
  • user avatar
    Stas Bekman
    @StasBekman
    Dec 11, 2021
    I have just added a brief summary of Transformers with Mixture of Experts architectures with pointers to papers and blog posts that you can study for more details. huggingface.co/docs/transform… the diagram is from the Google blog post linked in the summary.
    Image

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement