Gargi Ghosh (@gargighosh) / X

Gargi Ghosh

81 posts

Gargi Ghosh

@gargighosh

Researcher at FAIR (Meta AI)

Bellevue, WA

scholar.google.com/citations?user…

Joined December 2009

Pinned
Gargi Ghosh
@gargighosh
Dec 27, 2024
We released new research - Byte Latent Transformer(BLT) BLT encodes bytes into dynamic patches using light-weight local models and processes them with a large latent transformer. Think of it as a transformer sandwich!
AI at Meta
@AIatMeta
Dec 27, 2024
New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ go.fb.me/w23lmz
71K
Gargi Ghosh
@gargighosh
Dec 30, 2024
Efficient scaling with Memory Layers, replacing FFNs with memory units: arxiv.org/pdf/2412.09764
67K
Gargi Ghosh
@gargighosh
Dec 31, 2024
Last one of the year - EWE: arxiv.org/pdf/2412.18069 Ewe (Explicit Working Memory), enhances factuality in long-form text generation by integrating a working memory that receives real-time feedback from external resources.
11K
Gargi Ghosh
@gargighosh
May 17, 2024
Our latest work - early fusion model with mix modality in the input and output
AI at Meta
@AIatMeta
May 17, 2024
Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ go.fb.me/7rb19n
14K
Gargi Ghosh
@gargighosh
Dec 13, 2024
Sharing new research from my team- 1)Dynamic Byte Latent Transformer- First byte level model that matches current LLM performance at scale, with significant improvements in Robustness, efficiency and better awareness of rare sub-words
ai.meta.com
Sharing new research, models, and datasets from Meta FAIR
Meta FAIR is releasing new research artifacts that highlight our recent innovations in developing agents, robustness and safety, and architectures that facilitate machine learning.
2.9K
Gargi Ghosh
@gargighosh
Jun 18, 2024
Open sourcing Chameleon! Our work from last year - early fusion multimodal foundation model. We are releasing multimodalality in the input with text generation in the output( though the model was trained to generate text and image).
AI at Meta
@AIatMeta
Jun 18, 2024
Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and
00:00
3.6K
Gargi Ghosh
@gargighosh
Dec 27, 2024
Replying to @gargighosh
Entropy patching dynamically adjusts patch sizes based on data complexity, allowing BLT to allocate more compute to hard predictions and use larger patches for simpler ones. This results in fewer larger transformer steps to cover the same data
691
Gargi Ghosh
@gargighosh
Dec 27, 2024
Replying to @gargighosh
Without direct modeling of bytes Llama 3.1 trained on 16x more data still lags behind on some of these tasks!
727
Gargi Ghosh
@gargighosh
Dec 27, 2024
Replying to @gargighosh
Joint work with @ArtidoroPagnoni @ramakanth1729 @EntilZhaPR @JohnNguyen @ben_mlr @margs_li @violet_zct @liliyu_lili @jaseweston @LukeZettlemoyer @ml_perception @universeinanegg @sriniiyer88
747
Gargi Ghosh
@gargighosh
Dec 27, 2024
Replying to @gargighosh
BLT unlocks a new scaling dimension by simultaneously growing patch and model size without changing training or inference cost. Patch length scaling quickly overtakes BPE transformer scaling, and the trends look even better at larger scales!
538
Gargi Ghosh
@gargighosh
Dec 27, 2024
Replying to @gargighosh
Compute controlled (8B) BLT matches performance in standard benchmarks. Can trade minor losses in for up to 50% reductions in inference flops. BLT excels at modeling the long-tail of data with better robustness to noise and improved understanding and manipulation of substrings.
461
Gargi Ghosh
@gargighosh
Dec 30, 2024
Replying to @gargighosh
Sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply. This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale.
1.1K
Gargi Ghosh
@gargighosh
Aug 1, 2024
Our latest work for multimodal MOE
Victoria X Lin
@VictoriaLinML
Aug 1, 2024
1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (arxiv.org/pdf/2407.21770). MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any
778
Gargi Ghosh
@gargighosh
Dec 30, 2024
Replying to @gargighosh
Joint work with @vinceberges , @barlas_berkeley, @d_haziza, @scottyih, @LukeZettlemoyer
761