Log inSign up
Sham Kakade
538 posts
user avatar
Sham Kakade
@ShamKakade6
Harvard Professor. Full stack ML and AI. Co-director of the Kempner Institute for the Study of Artificial and Natural Intelligence.
shamulent.github.io
Joined December 2018
578
Following
18.8K
Followers
  • user avatar
    Sham Kakade
    @ShamKakade6
    Oct 14, 2025
    1/8 Second Order Optimizers like SOAP and Muon have shown impressive performance on LLM optimization. But are we fully utilizing the potential of second order information? New work: we show that a full second order optimizer is much better than existing optimizers in terms of
    Image
    145K
  • user avatar
    Sham Kakade
    @ShamKakade6
    Oct 18, 2021
    Super excited to join Harvard with a stellar group of new hires and looking forward to many new collabs with the terrific faculty there! Def sad to be leaving my wonderful UW and MSR colleagues and friends; rest assured, I'll keep up the collabs!
    Image
    Seven join Harvard computer science faculty
    From seas.harvard.edu
  • user avatar
    Sham Kakade
    @ShamKakade6
    Sep 18, 2024
    1/n Introducing SOAP (ShampoO with Adam in the Preconditioner's eigenbasis): A deep learning optimization algorithm that applies Adam in Shampoo's eigenbasis. SOAP outperforms both AdamW and Shampoo in language model pretraining.
    Image
    52K
  • user avatar
    Sham Kakade
    @ShamKakade6
    Jul 1, 2020
    Thank you so much to the awards committee! Also a huge thanks to the past and current ICML chairs and organizers for all their great work for our community! 👍 It is an honor to receive this 😀😀 with such wonderful co-authors: @arkrause, Matthias, and Niranjan!
    user avatar
    Hal Daumé III
    @haldaume3
    Jul 1, 2020
    We are very pleased to announce that the #icml2020 Test of Time award goes to Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design by Niranjan Srinivas, Andreas Krause, Sham Kakade and Matthias Seeger icml.cc/Conferences/20… >
    Image
    Image
  • user avatar
    Sham Kakade
    @ShamKakade6
    Dec 7, 2021
    Grateful to Priscilla Chan/Mark Zuckerberg (@ChanZuckerberg Initiative) for generous gift. Kempner Natural & Artificial Intelligence Institute @Harvard. Excited to work w/@blsabatini+new colleagues to provide new educational and research opportunities.
    Image
    New Harvard institute to study natural, artificial intelligence
    From news.harvard.edu
  • user avatar
    Sham Kakade
    @ShamKakade6
    Nov 15, 2023
    Can inductive biases explain mechanistic interpretability? Why do sinusoidal patterns emerge for NNs trained on modular addition? (e.g. @NeelNanda5) New work pins this down! w/ @depen_morwani @EdelmanBen @rosieyzh @costinoncescu! @KempnerInst
    Image
    GIF
    65K
  • user avatar
    Sham Kakade
    @ShamKakade6
    Oct 15, 2025
    1/9 Introducing LOTION (Low-precision optimization via stochastic-noise smoothing), a principled alternative to Quantization-Aware Training (QAT) that explicitly smooths the quantized loss surface while preserving all global minima of the true quantized loss. Details below:
    Image
    GIF
    27K
  • user avatar
    Sham Kakade
    @ShamKakade6
    Oct 18, 2025
    1/6 Introducing Seesaw: a principled batch size scheduling algo. Seesaw achieves theoretically optimal serial run time given a fixed compute budget and also matches the performance of cosine annealing at fixed batch size.
    Image
    42K
  • user avatar
    Sham Kakade
    @ShamKakade6
    Jul 12, 2024
    Which optimizer is opt? Our new work compares SGD, Adam, Adafactor (+ momentum), Lion, and, simply, SignSGD on LLM training wrt performance _and_ hyperparameter stability. tldr: Use anything but SGD, the rest are nearly identical:
    Image
    62K
  • user avatar
    Sham Kakade
    @ShamKakade6
    Mar 3, 2020
    couldn't reconcile theory/practice with dropout for over a year. New: arxiv.org/abs/2002.12915 w/ @tengyuma & C. Wei. turns out dropout sometimes has an implicit regularization effect! pretty wild. just like small vs. large batch sgd. these plots def surprised us!
  • user avatar
    Sham Kakade
    @ShamKakade6
    Oct 10, 2019
    What actually constitutes a good representation for reinforcement learning? Lots of sufficient conditions. But what's necessary? New paper: arxiv.org/abs/1910.03016. Surprisingly, good value (or policy) based representations just don't cut it! w/ @SimonShaoleiDu @RuosongW @lyang36
    arXiv logo
    arxiv.org
    Is a Good Representation Sufficient for Sample Efficient...
    Modern deep learning methods provide effective means to learn good representations. However, is a good representation itself sufficient for sample efficient reinforcement learning? This question...
  • user avatar
    Sham Kakade
    @ShamKakade6
    Jun 21, 2025
    1/6 Infinite-dim SGD in linear regression is the strawman model for studying scaling laws, critical batch sizes, and LR schedules. We revisit (and simplify) its analysis using just linear algebra, making it easier to derive and reason about. No PSD operators. No tensor calculus.
    35K
  • user avatar
    Sham Kakade
    @ShamKakade6
    Oct 19, 2019
    Wrapped up at the "Workshop on Theory of Deep Learning: Where next?" at IAS. math.ias.edu/wtdl. The field has moved so much! e.g. Neural Tangent Kernel (NTK) results! A few years ago, understanding DL looked hopeless. Terrific set of talks, too!
  • user avatar
    Sham Kakade
    @ShamKakade6
    Nov 22, 2024
    (1/n) 💡How can we speed up the serial runtime of long pre-training runs? Enter Critical Batch Size (CBS): the tipping point where the gains of data parallelism balance with diminishing efficiency. Doubling batch size halves the optimization steps—until we hit CBS, beyond which
    Image
    23K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement