Goodfire (@GoodfireAI) / X

Goodfire

656 posts

Goodfire

@GoodfireAI

Using interpretability to understand, learn from, and design AI.

San Francisco

Joined August 2024

Pinned
Goodfire
@GoodfireAI
May 7
Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵
00:00
3.2M
Goodfire reposted
Vmax
@VmaxAI
Jun 18
Following the blog post from our collaboration with @GoodfireAI, the arxiv paper for PROPEL is now available.
Augustine Mavor-Parker
@MavorParker
Jun 18
Replying to @MavorParker
The arxiv is now live! arxiv.org/abs/2606.18284
2.6K
Goodfire
@GoodfireAI
Jun 17
We're hosting a happy hour at ICML, Wednesday July 8! Come connect with members of the Goodfire team. Learn about our work in neural geometry and other recent publications. Note that space is limited, and we’re prioritizing attendees who are actively engaged in relevant AI
14K
Goodfire
@GoodfireAI
Jun 17
Sign up for our ICML happy hour here:
Goodfire ICML Happy Hour · Luma
From luma.com
1.6K
Goodfire reposted
Santiago Aranguri
@santiaranguri
Jun 12
Happy to see our work cited in the Claude Fable & Mythos system card! Steering against eval awareness can carry confounds (e.g. making the model more friendly). Interpretability can help us understand these, and is a promising source of new methods to deal with eval awareness.
2K
Goodfire
@GoodfireAI
Jun 11
Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)
00:00
175K
Goodfire
@GoodfireAI
Jun 11
Replying to @GoodfireAI
If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design. Request access to Silico here: goodfire.ai/silico (9/9)
Build AI models the way you write software
From goodfire.ai
4.4K
Goodfire
@GoodfireAI
Jun 11
Read the full blog post on predictive data debugging:
Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train
From goodfire.ai
4.1K
Goodfire
@GoodfireAI
Jun 10
Cool work applying the idea from our work on RLFR to RL task generation!
Augustine Mavor-Parker
@MavorParker
Jun 10
Training a model to generate RL tasks not too hard, not too easy costs many solver runs per task. PROPEL predicts difficulty via a probe on its activations instead, amortizing cost and speeding up generator optimization. New open-ended RL research from @vmax + @GoodfireAI.
GIF
7.6K
Goodfire
@GoodfireAI
Jun 4
New Goodfire research: using logits to monitor for eval awareness!
Santiago Aranguri
@santiaranguri
Jun 4
Would an LLM tell you if it’s gaming your eval? Often, no. But we can still catch the model thinking about it. New research: we measure how close a model comes to saying it’s being tested. This detects eval awareness with 10× to 100× fewer samples than monitoring model
11K
Goodfire reposted
Salesforce Ventures
@SalesforceVC
Jun 2
The idea that launched @GoodfireAI🔥 When ChatGPT launched, most people were blinded by the possibilities. Eric Ho saw the risk it posed. "I kind of saw the next few years unfold before me, where we were about to get increasingly powerful models ... massive amounts of compute,
00:00
2K
Goodfire reposted
Ekdeep Singh Lubana
@EkdeepL
Jun 1
Very excited to have this paper out! We show by having more parameters, larger models see reduced interference between updates. This allows them to retain memories of rarely observed samples of a task, eventually allowing them to learn even the tail-end of the distribution. (1/3)
00:00
Christopher Potts
@ChrisGPotts
Jun 1
We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.
16K
Goodfire
@GoodfireAI
Jun 1
New research from Goodfire and collaborators: why do larger models learn more tasks? (spoiler: it’s bottlenecked by data)
Christopher Potts
@ChrisGPotts
Jun 1
We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.
22K
Goodfire reposted
Can Rager
@can_rager
May 21
The "tiling" perspective explains a lot of the common problems with SAEs
00:21
Goodfire
@GoodfireAI
May 21
The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)
16K