Log inSign up
Goodfire
656 posts
Image
user avatar
Goodfire
@GoodfireAI
Using interpretability to understand, learn from, and design AI.
San Francisco
goodfire.ai
Joined August 2024
29
Following
24.2K
Followers
  • Pinned
    user avatar
    Goodfire
    @GoodfireAI
    May 7
    Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵
    Image
    00:00
    3.2M
  • Goodfire reposted
    user avatar
    Vmax
    @VmaxAI
    Jun 18
    Following the blog post from our collaboration with @GoodfireAI, the arxiv paper for PROPEL is now available.
    user avatar
    Augustine Mavor-Parker
    @MavorParker
    Jun 18
    Replying to @MavorParker
    The arxiv is now live! arxiv.org/abs/2606.18284
    2.6K
  • user avatar
    Goodfire
    @GoodfireAI
    Jun 17
    We're hosting a happy hour at ICML, Wednesday July 8! Come connect with members of the Goodfire team. Learn about our work in neural geometry and other recent publications. ​Note that space is limited, and we’re prioritizing attendees who are actively engaged in relevant AI
    14K
    user avatar
    Goodfire
    @GoodfireAI
    Jun 17
    Sign up for our ICML happy hour here:
    Image
    Goodfire ICML Happy Hour · Luma
    From luma.com
    1.6K
  • Goodfire reposted
    user avatar
    Santiago Aranguri
    @santiaranguri
    Jun 12
    Happy to see our work cited in the Claude Fable & Mythos system card! Steering against eval awareness can carry confounds (e.g. making the model more friendly). Interpretability can help us understand these, and is a promising source of new methods to deal with eval awareness.
    Image
    2K
  • user avatar
    Goodfire
    @GoodfireAI
    Jun 11
    Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)
    Image
    00:00
    175K
    user avatar
    Goodfire
    @GoodfireAI
    Jun 11
    Replying to @GoodfireAI
    If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design. Request access to Silico here: goodfire.ai/silico (9/9)
    Image
    Build AI models the way you write software
    From goodfire.ai
    4.4K
    user avatar
    Goodfire
    @GoodfireAI
    Jun 11
    Read the full blog post on predictive data debugging:
    Image
    Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train
    From goodfire.ai
    4.1K
  • user avatar
    Goodfire
    @GoodfireAI
    Jun 10
    Cool work applying the idea from our work on RLFR to RL task generation!
    user avatar
    Augustine Mavor-Parker
    @MavorParker
    Jun 10
    Training a model to generate RL tasks not too hard, not too easy costs many solver runs per task. PROPEL predicts difficulty via a probe on its activations instead, amortizing cost and speeding up generator optimization. New open-ended RL research from @vmax + @GoodfireAI.
    Image
    GIF
    7.6K
  • user avatar
    Goodfire
    @GoodfireAI
    Jun 4
    New Goodfire research: using logits to monitor for eval awareness!
    user avatar
    Santiago Aranguri
    @santiaranguri
    Jun 4
    Would an LLM tell you if it’s gaming your eval? Often, no. But we can still catch the model thinking about it. New research: we measure how close a model comes to saying it’s being tested. This detects eval awareness with 10× to 100× fewer samples than monitoring model
    Image
    11K
  • Goodfire reposted
    user avatar
    Salesforce Ventures
    @SalesforceVC
    Jun 2
    The idea that launched @GoodfireAI🔥 When ChatGPT launched, most people were blinded by the possibilities. Eric Ho saw the risk it posed. "I kind of saw the next few years unfold before me, where we were about to get increasingly powerful models ... massive amounts of compute,
    Image
    00:00
    2K
  • Goodfire reposted
    user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Jun 1
    Very excited to have this paper out! We show by having more parameters, larger models see reduced interference between updates. This allows them to retain memories of rarely observed samples of a task, eventually allowing them to learn even the tail-end of the distribution. (1/3)
    Image
    00:00
    Title card for a research paper. The title reads "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention." Authors listed: Jing Huang, Daniel Wurgaft, Rachit Bansal, Laura Ruis, Naomi Saphra, David Alvarez-Melis, Andrew Lampinen, Christopher Potts, and Ekdeep Singh Lubana. A Goodfire logo appears below the names. Author affiliations: Stanford University, Kempner Institute at Harvard University, MIT, and Anthropic.
    user avatar
    Christopher Potts
    @ChrisGPotts
    Jun 1
    We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.
    16K
  • user avatar
    Goodfire
    @GoodfireAI
    Jun 1
    New research from Goodfire and collaborators: why do larger models learn more tasks? (spoiler: it’s bottlenecked by data)
    user avatar
    Christopher Potts
    @ChrisGPotts
    Jun 1
    We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.
    Title card for a research paper. The title reads "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention." Authors listed: Jing Huang, Daniel Wurgaft, Rachit Bansal, Laura Ruis, Naomi Saphra, David Alvarez-Melis, Andrew Lampinen, Christopher Potts, and Ekdeep Singh Lubana. A Goodfire logo appears below the names. Author affiliations: Stanford University, Kempner Institute at Harvard University, MIT, and Anthropic.
    22K
  • Goodfire reposted
    user avatar
    Can Rager
    @can_rager
    May 21
    The "tiling" perspective explains a lot of the common problems with SAEs
    Image
    Image
    00:21
    user avatar
    Goodfire
    @GoodfireAI
    May 21
    The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)
    16K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement