Ekdeep Singh Lubana (@EkdeepL) / X

Ekdeep Singh Lubana

981 posts

Ekdeep Singh Lubana

@EkdeepL

Member of Technical Staff @GoodfireAI; Previously: Postdoc / PhD at Center for Brain Science, Harvard and University of Michigan

San Francisco, CA

ekdeepslubana.github.io

Joined December 2017

Ekdeep Singh Lubana
@EkdeepL
Nov 10, 2024
Paper alert—accepted as a NeurIPS *Spotlight*!🧵👇 We build on our past work relating emergence to task compositionality and analyze the *learning dynamics* of such tasks: we find there exist latent interventions that can elicit them much before input prompting works! 🤯
GIF
111K
Ekdeep Singh Lubana
@EkdeepL
Dec 18, 2024
Paper alert––*Awarded best paper* at NeurIPS workshop on Foundation Model Interventions! 🧵👇 We analyze the (in)abilities of SAEs by relating them to the field of disentangled rep. learning, where limitations of AE based interpretability protocols have been well established!🤯
GIF
71K
Ekdeep Singh Lubana
@EkdeepL
Jun 6, 2025
🚨 New paper alert! Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔 1/11
GIF
39K
Ekdeep Singh Lubana
@EkdeepL
Jun 28, 2025
🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/
GIF
71K
Ekdeep Singh Lubana
@EkdeepL
Aug 13, 2025
Super excited to be joining @GoodfireAI! I'll be scaling up the line of work our group started at Harvard: making predictive accounts of model representations by assuming a model behaves optimally (i.e., good old rational analysis from cogsci!)
Goodfire
@GoodfireAI
Aug 12, 2025
Thrilled to welcome @EkdeepL to the team! Ekdeep is working on a new research agenda on “cognitive interpretability”, aimed at adapting and improving theories of human cognition to design tools for explaining model cognition.
38K
Ekdeep Singh Lubana
@EkdeepL
Jul 9, 2021
A multitude of normalization layers have been proposed recently, but are we ready to replace BatchNorm yet? In our new preprint, we address this question by developing a unified understanding of normalization layers in deep learning. arXiv link: arxiv.org/abs/2106.05956
Ekdeep Singh Lubana
@EkdeepL
Feb 16, 2025
New paper–accepted as *spotlight* at #ICLR2025! 🧵👇 We show a competition dynamic between several algorithms splits a toy model’s ICL abilities into four broad phases of train/test settings! This means ICL is akin to a mixture of different algorithms, not a monolithic ability.
GIF
31K
Ekdeep Singh Lubana
@EkdeepL
Nov 13, 2025
New paper! Language has rich, multiscale temporal structure, but sparse autoencoders assume features are *static* directions in activations. To address this, we propose Temporal Feature Analysis: a predictive coding protocol that models dynamics in LLM activations! (1/14)
GIF
54K
Ekdeep Singh Lubana
@EkdeepL
Feb 25, 2025
New paper–Accepted at #ICLR2025 and also my last PhD paper! 🧑‍🎓🧵👇 We propose a novel model of how emergent learning curves show up in neural nets’ training by making a connection to the theory of graph percolation!
23K
Ekdeep Singh Lubana
@EkdeepL
Jan 5, 2025
New paper alert! 🧵👇 We show representations of concepts seen by a model during pretraining can be morphed to reflect novel semantics! We do this by building a task based on the conceptual role semantics "theory of meaning"--an idea I'd been wanting to pursue for SO long! 1/n
GIF
30K
Ekdeep Singh Lubana
@EkdeepL
Dec 10, 2023
Several papers have claimed the “emergence” of specific capabilities in generative models recently–-what drives this behavior? A result from our #NeurIPS23 paper partially addresses this question! Check out arxiv.org/abs/2310.09336 A thread about just this result! (1/n)
29K
Ekdeep Singh Lubana
@EkdeepL
May 2, 2025
New paper---freshly accepted to ICML! Detailed thread coming soon, but pretty excited about this project. We use synthetic knowledge graphs to study why knowledge editing protocols can screw up model capabilities, finding what we call a "representation shattering" effect!
14K
Ekdeep Singh Lubana
@EkdeepL
Nov 22, 2022
Preprint time! 🧵 DNNs can use entirely distinct prediction mechanisms to solve a task (e.g., background vs. shape). Q1: Are such models mode-connected in the landscape? Q2: Can we change a model’s mechanisms by exploiting such connectivity? Link: arxiv.org/abs/2211.08422 1/12
GIF
Ekdeep Singh Lubana
@EkdeepL
Nov 10, 2024
Replying to @EkdeepL
We hypothesize the sudden turns mark *disentanglement* of concepts, and the model can arbitrarily compose after this turn. But learning dynamics show otherwise–what’s going on? Turns out capabilities are *latent* at this point, but can be elicited via mere linear interventions!
GIF
50K