Neel Nanda (@NeelNanda5) / X

Neel Nanda

5,297 posts

Neel Nanda

@NeelNanda5

Mechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!

London, UK

Joined June 2022

Neel Nanda
@NeelNanda5
Jan 24, 2025
My girlfriend returned from Taiwan with the most romantic gift: an TSMC exclusive notebook! Turns out this notebook is so limited edition that it's only available to TSMC employees, but she found a second-hand seller, and gave up her afternoon to go meet them. I feel so loved ❤️
202K
Neel Nanda
@NeelNanda5
May 12, 2025
After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS
340K
Neel Nanda
@NeelNanda5
Sep 8, 2025
I'm honoured to have made the MIT Tech Review Innovators Under 35 List for mechanistic interpretability research and work to build the field I think technical work to deliberately build a research field is underrated and leveraged. It's great to see how far mech interp has come!
171K
Neel Nanda
@NeelNanda5
Sep 9, 2025
I'm excited that, this year, interpretability finally works well enough to be practically useful in the real world! We found that, with enough effort into dataset construction, simple linear probes are cheap, real-time, token level hallucination detectors and beat baselines
Oscar Balcells Obeso
@OBalcells
Sep 9, 2025
Imagine if ChatGPT highlighted every word it wasn't sure about. We built a streaming hallucination detector that flags hallucinations in real-time.
00:00
120K
Neel Nanda
@NeelNanda5
Aug 15, 2022
I've spent the past few months exploring @OpenAI's grokking result through the lens of mechanistic interpretability. I fully reverse engineered the modular addition model, and looked at what it does when training. So what's up with grokking? A 🧵... (1/17) alignmentforum.org/posts/N6WM6hs7…
Neel Nanda
@NeelNanda5
Dec 2, 2024
I know I've really made it as a researcher when Claude unexpectedly says this: (Context: It was helping copy edit a PhD letter of recommendation for one of my mentees)
159K
Neel Nanda
@NeelNanda5
Mar 25, 2025
The LessWrong policy against LLM spam has an incredible escape clause for AI agents that want to whistleblow - I love it!
90K
Neel Nanda
@NeelNanda5
Oct 10, 2025
Extremely slimy behaviour from OpenAI. If I worked for OpenAI I'd be pretty embarrassed about my employer right now If you want the world to trust you to make super intelligence, you need to hold yourself to *far* higher standards
Nathan Calvin
@_NathanCalvin
Oct 10, 2025
One Tuesday night, as my wife and I sat down for dinner, a sheriff’s deputy knocked on the door to serve me a subpoena from OpenAI. I held back on talking about it because I didn't want to distract from SB 53, but Newsom just signed the bill so... here's what happened: 🧵
104K
Neel Nanda
@NeelNanda5
Dec 23, 2023
My first @GoogleDeepMind project: How do LLMs recall facts? Early MLP layers act as a lookup table, with significant superposition! They recognise entities and produce their attributes as directions. We suggest viewing fact recall as a black box making "multi-token embeddings”
129K
Neel Nanda
@NeelNanda5
Jul 19, 2025
Speaking as a past IMO contestant, this is impressive but misleading - gold vs silver is meaningless, 1 pt below gold vs borderline gold is noise The impressive bit is using a general reasoning model, not a specialised system, and no verified reward. Peak AI maths is unchanged
Alexander Wei
@alexwei_
Jul 19, 2025
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
172K
Neel Nanda
@NeelNanda5
Sep 30, 2024
I find the replies to this tweet wild and sad. Isn't it pretty obvious by now that the old OpenAI board was right? Healthy companies, with good CEOs, do not threaten their employee's compensation, have a long stream of executives quitting and so many scandals.
Helen Toner
@hlntnr
Sep 30, 2024
Hi Marc 👋 Seems like you've joined the confusingly large club of people who have strong opinions about me & what I think, despite having ~no idea what I actually think. Happy to talk sometime if you want to fix that, otherwise, maybe pick a different villain for your fanfic?
182K
Neel Nanda
@NeelNanda5
Jul 31, 2024
Sparse Autoencoders act like a microscope for AI internals. They're a powerful tool for interpretability, but training costs limit research Announcing Gemma Scope: An open suite of SAEs on every layer & sublayer of Gemma 2 2B & 9B! We hope to enable even more ambitious work
GIF
211K
Neel Nanda
@NeelNanda5
Mar 22, 2025
Working for Google certainly has its share of BS, but I've never had anything as bad as an employer threatening to take back years of paid compensation unless I signed a lifetime concealed non disparagement. Not everything is an upgrade.
near
@nearcyan
Mar 22, 2025
If you work on core Google AI products and are interested in a more fun work environment with a higher talent bar, and most importantly, less bureaucracy and BS, consider joining Anthropic, OpenAI, or xAI! All three are aggressively hiring. I will match you with a recruiter, DM!
107K
Neel Nanda
@NeelNanda5
Jan 25, 2025
Why do seemingly all the ML conferences not acknowledge the existence of the many ML researchers in industry without PhDs?
85K