Log inSign up
Ekdeep Singh Lubana
981 posts
user avatar
Ekdeep Singh Lubana
@EkdeepL
Member of Technical Staff @GoodfireAI; Previously: Postdoc / PhD at Center for Brain Science, Harvard and University of Michigan
San Francisco, CA
ekdeepslubana.github.io
Joined December 2017
1,312
Following
2,918
Followers
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Nov 10, 2024
    Paper alert—accepted as a NeurIPS *Spotlight*!🧵👇 We build on our past work relating emergence to task compositionality and analyze the *learning dynamics* of such tasks: we find there exist latent interventions that can elicit them much before input prompting works! 🤯
    Image
    GIF
    111K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Dec 18, 2024
    Paper alert––*Awarded best paper* at NeurIPS workshop on Foundation Model Interventions! 🧵👇 We analyze the (in)abilities of SAEs by relating them to the field of disentangled rep. learning, where limitations of AE based interpretability protocols have been well established!🤯
    Image
    GIF
    71K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Jun 6, 2025
    🚨 New paper alert! Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔 1/11
    Image
    GIF
    39K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Jun 28, 2025
    🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/
    Image
    GIF
    71K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Aug 13, 2025
    Super excited to be joining @GoodfireAI! I'll be scaling up the line of work our group started at Harvard: making predictive accounts of model representations by assuming a model behaves optimally (i.e., good old rational analysis from cogsci!)
    user avatar
    Goodfire
    @GoodfireAI
    Aug 12, 2025
    Thrilled to welcome @EkdeepL to the team! Ekdeep is working on a new research agenda on “cognitive interpretability”, aimed at adapting and improving theories of human cognition to design tools for explaining model cognition.
    38K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Jul 9, 2021
    A multitude of normalization layers have been proposed recently, but are we ready to replace BatchNorm yet? In our new preprint, we address this question by developing a unified understanding of normalization layers in deep learning. arXiv link: arxiv.org/abs/2106.05956
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Feb 16, 2025
    New paper–accepted as *spotlight* at #ICLR2025! 🧵👇 We show a competition dynamic between several algorithms splits a toy model’s ICL abilities into four broad phases of train/test settings! This means ICL is akin to a mixture of different algorithms, not a monolithic ability.
    Image
    GIF
    31K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Nov 13, 2025
    New paper! Language has rich, multiscale temporal structure, but sparse autoencoders assume features are *static* directions in activations. To address this, we propose Temporal Feature Analysis: a predictive coding protocol that models dynamics in LLM activations! (1/14)
    Image
    GIF
    54K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Feb 25, 2025
    New paper–Accepted at #ICLR2025 and also my last PhD paper! 🧑‍🎓🧵👇 We propose a novel model of how emergent learning curves show up in neural nets’ training by making a connection to the theory of graph percolation!
    Image
    23K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Jan 5, 2025
    New paper alert! 🧵👇 We show representations of concepts seen by a model during pretraining can be morphed to reflect novel semantics! We do this by building a task based on the conceptual role semantics "theory of meaning"--an idea I'd been wanting to pursue for SO long! 1/n
    Image
    GIF
    30K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Dec 10, 2023
    Several papers have claimed the “emergence” of specific capabilities in generative models recently–-what drives this behavior? A result from our #NeurIPS23 paper partially addresses this question! Check out arxiv.org/abs/2310.09336 A thread about just this result! (1/n)
    Image
    29K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    May 2, 2025
    New paper---freshly accepted to ICML! Detailed thread coming soon, but pretty excited about this project. We use synthetic knowledge graphs to study why knowledge editing protocols can screw up model capabilities, finding what we call a "representation shattering" effect!
    Image
    14K
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Nov 22, 2022
    Preprint time! 🧵 DNNs can use entirely distinct prediction mechanisms to solve a task (e.g., background vs. shape). Q1: Are such models mode-connected in the landscape? Q2: Can we change a model’s mechanisms by exploiting such connectivity? Link: arxiv.org/abs/2211.08422 1/12
    Image
    GIF
  • user avatar
    Ekdeep Singh Lubana
    @EkdeepL
    Nov 10, 2024
    Replying to @EkdeepL
    We hypothesize the sudden turns mark *disentanglement* of concepts, and the model can arbitrarily compose after this turn. But learning dynamics show otherwise–what’s going on? Turns out capabilities are *latent* at this point, but can be elicited via mere linear interventions!
    Image
    GIF
    50K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement