Log inSign up
Neel Nanda
5,297 posts
Image
user avatar
Neel Nanda
@NeelNanda5
Mechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
London, UK
neelnanda.io
Joined June 2022
122
Following
41K
Followers
  • user avatar
    Neel Nanda
    @NeelNanda5
    Jan 24, 2025
    My girlfriend returned from Taiwan with the most romantic gift: an TSMC exclusive notebook! Turns out this notebook is so limited edition that it's only available to TSMC employees, but she found a second-hand seller, and gave up her afternoon to go meet them. I feel so loved ❤️
    Image
    202K
  • user avatar
    Neel Nanda
    @NeelNanda5
    May 12, 2025
    After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS
    Image
    340K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Sep 8, 2025
    I'm honoured to have made the MIT Tech Review Innovators Under 35 List for mechanistic interpretability research and work to build the field I think technical work to deliberately build a research field is underrated and leveraged. It's great to see how far mech interp has come!
    Image
    171K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Sep 9, 2025
    I'm excited that, this year, interpretability finally works well enough to be practically useful in the real world! We found that, with enough effort into dataset construction, simple linear probes are cheap, real-time, token level hallucination detectors and beat baselines
    user avatar
    Oscar Balcells Obeso
    @OBalcells
    Sep 9, 2025
    Imagine if ChatGPT highlighted every word it wasn't sure about. We built a streaming hallucination detector that flags hallucinations in real-time.
    Image
    00:00
    120K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Aug 15, 2022
    I've spent the past few months exploring @OpenAI's grokking result through the lens of mechanistic interpretability. I fully reverse engineered the modular addition model, and looked at what it does when training. So what's up with grokking? A 🧵... (1/17) alignmentforum.org/posts/N6WM6hs7…
  • user avatar
    Neel Nanda
    @NeelNanda5
    Dec 2, 2024
    I know I've really made it as a researcher when Claude unexpectedly says this: (Context: It was helping copy edit a PhD letter of recommendation for one of my mentees)
    Image
    159K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Mar 25, 2025
    The LessWrong policy against LLM spam has an incredible escape clause for AI agents that want to whistleblow - I love it!
    Image
    90K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Oct 10, 2025
    Extremely slimy behaviour from OpenAI. If I worked for OpenAI I'd be pretty embarrassed about my employer right now If you want the world to trust you to make super intelligence, you need to hold yourself to *far* higher standards
    user avatar
    Nathan Calvin
    @_NathanCalvin
    Oct 10, 2025
    One Tuesday night, as my wife and I sat down for dinner, a sheriff’s deputy knocked on the door to serve me a subpoena from OpenAI. I held back on talking about it because I didn't want to distract from SB 53, but Newsom just signed the bill so... here's what happened: 🧵
    Image
    104K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Dec 23, 2023
    My first @GoogleDeepMind project: How do LLMs recall facts? Early MLP layers act as a lookup table, with significant superposition! They recognise entities and produce their attributes as directions. We suggest viewing fact recall as a black box making "multi-token embeddings”
    Image
    129K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Jul 19, 2025
    Speaking as a past IMO contestant, this is impressive but misleading - gold vs silver is meaningless, 1 pt below gold vs borderline gold is noise The impressive bit is using a general reasoning model, not a specialised system, and no verified reward. Peak AI maths is unchanged
    user avatar
    Alexander Wei
    @alexwei_
    Jul 19, 2025
    1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
    Image
    172K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Sep 30, 2024
    I find the replies to this tweet wild and sad. Isn't it pretty obvious by now that the old OpenAI board was right? Healthy companies, with good CEOs, do not threaten their employee's compensation, have a long stream of executives quitting and so many scandals.
    user avatar
    Helen Toner
    @hlntnr
    Sep 30, 2024
    Hi Marc 👋 Seems like you've joined the confusingly large club of people who have strong opinions about me & what I think, despite having ~no idea what I actually think. Happy to talk sometime if you want to fix that, otherwise, maybe pick a different villain for your fanfic?
    182K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Jul 31, 2024
    Sparse Autoencoders act like a microscope for AI internals. They're a powerful tool for interpretability, but training costs limit research Announcing Gemma Scope: An open suite of SAEs on every layer & sublayer of Gemma 2 2B & 9B! We hope to enable even more ambitious work
    Image
    GIF
    211K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Mar 22, 2025
    Working for Google certainly has its share of BS, but I've never had anything as bad as an employer threatening to take back years of paid compensation unless I signed a lifetime concealed non disparagement. Not everything is an upgrade.
    user avatar
    near
    @nearcyan
    Mar 22, 2025
    If you work on core Google AI products and are interested in a more fun work environment with a higher talent bar, and most importantly, less bureaucracy and BS, consider joining Anthropic, OpenAI, or xAI! All three are aggressively hiring. I will match you with a recruiter, DM!
    107K
  • user avatar
    Neel Nanda
    @NeelNanda5
    Jan 25, 2025
    Why do seemingly all the ML conferences not acknowledge the existence of the many ML researchers in industry without PhDs?
    Image
    85K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement