Log inSign up
Trevor Gale
238 posts
user avatar
Trevor Gale
@Tgale96
Research Scientist @ Google DeepMind | Former Stanford CS
Maine, USA
Joined April 2013
300
Following
1,395
Followers
  • Pinned
    user avatar
    Trevor Gale
    @Tgale96
    Mar 28, 2024
    Hi all, a few updates on MegaBlocks 🧵
    Image
    github.com
    GitHub - databricks/megablocks
    Contribute to databricks/megablocks development by creating an account on GitHub.
    50K
  • user avatar
    Trevor Gale
    @Tgale96
    Dec 8, 2023
    I woke up to an interesting PR in MegaBlocks this morning... 😅
    user avatar
    Mistral AI
    @MistralAI
    Dec 8, 2023
    magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=http%3A%2F%https://t.co/g0m9cEUz0T%3A80%2Fannounce RELEASE a6bbd9affe0c2725c1b7410d66833e24
    github.com
    Support new model by pierrestock · Pull Request #45 · databricks/megablocks
    124K
  • user avatar
    Trevor Gale
    @Tgale96
    Sep 30, 2020
    Want to run a sparse neural network at warp speed?🔥 The code from our paper is now open-source! We’ve released our sparse models, tuned code and our dataset of sparse matrices: github.com/google-researc…
  • user avatar
    Trevor Gale
    @Tgale96
    Jan 27, 2023
    🧵We’re excited to announce MegaBlocks, our system for efficient “dropless” MoE training! 🤖 MegaBlocks outperforms Tutel by up to 40% by reformulating MoEs as block-sparse operations, which allows us to avoid token dropping without sacrificing hardware efficiency 🚀.
    Image
    40K
  • user avatar
    Trevor Gale
    @Tgale96
    Mar 28, 2024
    Replying to @Tgale96
    I’m not done with MegaBlocks 😁 @apaszke @epiqueras1 @sharadvikram and I just dropped something we’ve been working on for a bit yesterday. MegaBlocks + JAX + TPU = MegaBlox 🔥
    Image
    Add MegaBlox grouped matrix multiplication kernels for TPU. by copybara-service[bot] · Pull Request...
    From github.com
    35K
  • user avatar
    Trevor Gale
    @Tgale96
    Mar 27, 2024
    Look how much fun we're all having together! Come MegaBlock with us! 🥰 github.com/stanford-futur…
    Image
    user avatar
    Julien Chaumond
    @julien_c
    Mar 27, 2024
    Open source AI is NOT a zero-sum game and some leading contributors like @Tgale96 show it 🥰⤵️
    20K
  • user avatar
    Trevor Gale
    @Tgale96
    Mar 27, 2024
    Replying to @jefrankle and @arthurmensch
    Don't fight guys we can all use MegaBlocks together 🥹
    13K
  • user avatar
    Trevor Gale
    @Tgale96
    Dec 8, 2020
    What stands between us and widespread use of sparsity in deep learning? I tried to organize some of my thoughts for this @sigarch blog post!
    Image
    The Future of Sparsity in Deep Neural Networks
    From sigarch.org
  • user avatar
    Trevor Gale
    @Tgale96
    Jun 23, 2020
    Excited to share something I've been working on for a while! Fast GPU kernels for sparse linear ops with @erich_elsen, Cliff Young and @matei_zaharia! With some fancy tricks, sparse ops can be faster than dense at as low as 71% sparsity 🔥 arxiv.org/abs/2006.10901
    Image
  • user avatar
    Trevor Gale
    @Tgale96
    May 13, 2024
    “But to us a “register” is a 16x16 tile of data.” Sounds like you guys might like TPUs 😁 Very fun post to read!
    user avatar
    Benjamin F Spector
    @bfspector
    May 12, 2024
    (1/7) Happy mother’s day! We think what the mothers of America really want is a Flash Attention implementation that’s just 100 lines of code and 30% faster, and we’re happy to provide. We're excited to introduce ThunderKittens (TK), a simple DSL embedded within CUDA that makes
    Image
    7.5K
  • user avatar
    Trevor Gale
    @Tgale96
    Oct 11, 2021
    Seems like pretty marginal quality gains from scaling parameter count by ~3x. 35 days on 3360 A100s, so maybe between $3M and $8M to train? Not sure this model makes sense to train, at least for these applications... developer.nvidia.com/blog/using-dee…
  • user avatar
    Trevor Gale
    @Tgale96
    May 19, 2021
    Submit your work to the all new “Sparsity in Neural Networks” workshop! We have an excellent speaker lineup and attendance is free. Hope to see you all there 😁
    user avatar
    Jonathan Frankle
    @jefrankle
    May 18, 2021
    NEW WORKSHOP: Sparsity in Neural Networks: Advancing Understanding and Practice (July 8-9, 2021). This workshop will bring together members of the many communities working on neural network sparsity to share their perspectives and the latest cutting-edge research (Deadline: 6/15)
    Image
  • user avatar
    Trevor Gale
    @Tgale96
    Feb 4, 2023
    Replying to @ml_hardware @abhi_venigalla and 2 others
    The Megatron paper did tell us to do this (5.1). Probably not the only trick we should steal from Megatron-LM 😁
    arXiv logo
    arxiv.org
    Megatron-LM: Training Multi-Billion Parameter Language Models...
    Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be...
    2.6K
  • user avatar
    Trevor Gale
    @Tgale96
    Dec 8, 2023
    Replying to @Tgale96
    Oh, and also a text from Mihir with this screenshot 😂
    user avatar
    Mihir Patel
    @mvpatel2000
    Dec 8, 2023
    "pierrestock changed the title Mixtral-8x7B support Support new model 6 hours ago"
    Image
    11K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement