Log inSign up
Laker Newhouse
21 posts
user avatar
Laker Newhouse
@LakerNewhouse
MIT '25 researching Muon & ML
Palo Alto, CA
lakernewhouse.com
Joined January 2023
17
Following
856
Followers
  • Pinned
    user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 21, 2025
    [1/6] Curious about Muon, but not sure where to start? I wrote a 3-part blog series called “Understanding Muon” designed to get you up to speed—with The Matrix references, annotated source code, and thoughts on where Muon might be going.
    Image
    36K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 19, 2025
    [1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.
    Image
    147K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 19, 2025
    Replying to @LakerNewhouse
    [2/9] Muon spectrally regulates gradients, but what if we also spectrally regulate weights? Then activations stay small—nearly fp8 range. Activation entries in our GPT-2 scale transformers don’t exceed ~100 vs. baseline ~1000. Check out our paper:
    arXiv logo
    arxiv.org
    Training Transformers with Enforced Lipschitz Constants
    Neural networks are often highly sensitive to input and weight perturbations. This sensitivity has been linked to pathologies such as vulnerability to adversarial examples, divergent training, and...
    6.2K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 19, 2025
    Replying to @LakerNewhouse
    [4/9] One of the things we’re most excited about is efficient primitives inspired by Muon and related to Kimi AI’s recent work. We introduce a family of methods to cap singular values via applying min(1, x), co-designed for Muon’s high stable rank update.
    Image
    4K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 19, 2025
    Replying to @LakerNewhouse
    [9/9] We’re really excited where the community can take this. We’re publishing all our code and data—there’s lots more to test and understand that can help us achieve adversarial robustness, bounded activations, and stable training at scale. Read the paper:
    arXiv logo
    arxiv.org
    Training Transformers with Enforced Lipschitz Constants
    Neural networks are often highly sensitive to input and weight perturbations. This sensitivity has been linked to pathologies such as vulnerability to adversarial examples, divergent training, and...
    3.3K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 19, 2025
    Replying to @LakerNewhouse
    [3/9] Our main goal was to enforce a provable Lipschitz bound on NanoGPT while matching unconstrained val loss. But more work is needed! Our current methods bound the Lipschitz constant at 10^264.
    Image
    4.6K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 19, 2025
    Replying to @LakerNewhouse
    [5/9] A Lipschitz bound controls how sensitive a network is to input or weight changes. With a low bound, a small change in the input can’t wildly change the output. Thus: Lower Lipschitz bound => more robust and more predictable model
    Image
    3.2K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 19, 2025
    Replying to @LakerNewhouse
    [6/9] To control a Lipschitz bound we need to control weight norms. There are many ways to do this, including weight decay, and we compare their “Lipschitzness to performance” tradeoff. Finding: Muon + capping singular values pushes the tradeoff frontier.
    Image
    2.8K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 19, 2025
    Replying to @LakerNewhouse
    [7/9] We’ve seen exciting related work come out recently including @Jianlin_S’s QK-clip algorithm and @_arohan_’s weight constraint thread, so we bet we missed important citations—we’d love to hear any related work we should include!
    user avatar
    rohan anil
    Core Automation
    @_arohan_
    Jun 3, 2025
    Doing some math to cleanse the timelinez Why do loss blow up? A question to deepthink. So an attempt: why not clip the singular values of the update? σ > 1, clip to 1 σ <=1, return σ Naive implementation: Update = U S V.T Update_clipped = U clip(S, 1) V.T How to make it
    3.7K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 21, 2025
    Replying to @LakerNewhouse
    [2/6] There’s been a lot of interest in Muon recently, so I wanted to make a practitioner’s guide that's accessible to everyone in the machine learning community.
    Image
    Understanding Muon
    From lakernewhouse.com
    2.4K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 19, 2025
    Replying to @LakerNewhouse
    [8/9] This is work done alongside fantastic friends and collaborators: @phess002 @leloykun @anzahorodnii @jxbz @phillip_isola
    3.6K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 21, 2025
    Replying to @LakerNewhouse
    [5/6] Chapter 3 is called “Weight Regulation.” The goal is to orient people toward some exciting recent work on Muon, including @Jianlin_S's MuonClip and our recent paper.
    user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 19, 2025
    [1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.
    Image
    2.5K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 21, 2025
    Replying to @LakerNewhouse
    [4/6] Chapter 2 is called “Source Code.” This was the original motivation for the whole series: I saw people getting stuck reading through Muon’s code. So I made line-by-line annotations you can hover over to read. No more being confused.
    Image
    2.4K
  • user avatar
    Laker Newhouse
    @LakerNewhouse
    Jul 21, 2025
    Replying to @LakerNewhouse
    [3/6] Chapter 1 is called “Into the Matrix.” Get ready for some fun Neo references while seeing why Muon looks at the gradient as a matrix, not a vector.
    Image
    2.1K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement