Log inSign up
Gargi Ghosh
81 posts
user avatar
Gargi Ghosh
@gargighosh
Researcher at FAIR (Meta AI)
Bellevue, WA
scholar.google.com/citations?user…
Joined December 2009
153
Following
1,033
Followers
  • Pinned
    user avatar
    Gargi Ghosh
    @gargighosh
    Dec 27, 2024
    We released new research - Byte Latent Transformer(BLT) BLT encodes bytes into dynamic patches using light-weight local models and processes them with a large latent transformer. Think of it as a transformer sandwich!
    Image
    Image
    user avatar
    AI at Meta
    Meta
    @AIatMeta
    Dec 27, 2024
    New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ go.fb.me/w23lmz
    71K
  • user avatar
    Gargi Ghosh
    @gargighosh
    Dec 30, 2024
    Efficient scaling with Memory Layers, replacing FFNs with memory units: arxiv.org/pdf/2412.09764
    Image
    67K
  • user avatar
    Gargi Ghosh
    @gargighosh
    Dec 31, 2024
    Last one of the year - EWE: arxiv.org/pdf/2412.18069 Ewe (Explicit Working Memory), enhances factuality in long-form text generation by integrating a working memory that receives real-time feedback from external resources.
    Image
    11K
  • user avatar
    Gargi Ghosh
    @gargighosh
    May 17, 2024
    Our latest work - early fusion model with mix modality in the input and output
    user avatar
    AI at Meta
    Meta
    @AIatMeta
    May 17, 2024
    Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ go.fb.me/7rb19n
    Image
    14K
  • user avatar
    Gargi Ghosh
    @gargighosh
    Dec 13, 2024
    Sharing new research from my team- 1)Dynamic Byte Latent Transformer- First byte level model that matches current LLM performance at scale, with significant improvements in Robustness, efficiency and better awareness of rare sub-words
    Image
    ai.meta.com
    Sharing new research, models, and datasets from Meta FAIR
    Meta FAIR is releasing new research artifacts that highlight our recent innovations in developing agents, robustness and safety, and architectures that facilitate machine learning.
    2.9K
  • user avatar
    Gargi Ghosh
    @gargighosh
    Jun 18, 2024
    Open sourcing Chameleon! Our work from last year - early fusion multimodal foundation model. We are releasing multimodalality in the input with text generation in the output( though the model was trained to generate text and image).
    user avatar
    AI at Meta
    Meta
    @AIatMeta
    Jun 18, 2024
    Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and
    Image
    00:00
    3.6K
  • user avatar
    Gargi Ghosh
    @gargighosh
    Dec 27, 2024
    Replying to @gargighosh
    Entropy patching dynamically adjusts patch sizes based on data complexity, allowing BLT to allocate more compute to hard predictions and use larger patches for simpler ones. This results in fewer larger transformer steps to cover the same data
    Image
    691
  • user avatar
    Gargi Ghosh
    @gargighosh
    Dec 27, 2024
    Replying to @gargighosh
    Without direct modeling of bytes Llama 3.1 trained on 16x more data still lags behind on some of these tasks!
    Image
    727
  • user avatar
    Gargi Ghosh
    @gargighosh
    Dec 27, 2024
    Replying to @gargighosh
    Joint work with @ArtidoroPagnoni @ramakanth1729 @EntilZhaPR @JohnNguyen @ben_mlr @margs_li @violet_zct @liliyu_lili @jaseweston @LukeZettlemoyer @ml_perception @universeinanegg @sriniiyer88
    747
  • user avatar
    Gargi Ghosh
    @gargighosh
    Dec 27, 2024
    Replying to @gargighosh
    BLT unlocks a new scaling dimension by simultaneously growing patch and model size without changing training or inference cost. Patch length scaling quickly overtakes BPE transformer scaling, and the trends look even better at larger scales!
    Image
    538
  • user avatar
    Gargi Ghosh
    @gargighosh
    Dec 27, 2024
    Replying to @gargighosh
    Compute controlled (8B) BLT matches performance in standard benchmarks. Can trade minor losses in for up to 50% reductions in inference flops. BLT excels at modeling the long-tail of data with better robustness to noise and improved understanding and manipulation of substrings.
    Image
    461
  • user avatar
    Gargi Ghosh
    @gargighosh
    Dec 30, 2024
    Replying to @gargighosh
    Sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply. This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale.
    Image
    1.1K
  • user avatar
    Gargi Ghosh
    @gargighosh
    Aug 1, 2024
    Our latest work for multimodal MOE
    user avatar
    Victoria X Lin
    @VictoriaLinML
    Aug 1, 2024
    1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (arxiv.org/pdf/2407.21770). MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any
    Image
    778
  • user avatar
    Gargi Ghosh
    @gargighosh
    Dec 30, 2024
    Replying to @gargighosh
    Joint work with @vinceberges , @barlas_berkeley, @d_haziza, @scottyih, @LukeZettlemoyer
    761

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement