Mozes Jacobs (@mozesjacobs) / X

Mozes Jacobs

39 posts

Mozes Jacobs

@mozesjacobs

PhD student @KempnerInst @Harvard. Fellow @GoodfireAI Representation Learning and Interpretability. Advised by Demba Ba.

mozesjacobs.github.io

Joined February 2025

Following

832

Followers

Pinned
Mozes Jacobs
@mozesjacobs
Feb 10
Are ViTs secretly RNNs? #ICLR2026 Our 2-block recurrent transformer recovers 96% of DINOv2’s IN-1k accuracy & reproduces its activations 1-to-1, motivating the Block-Recurrent Hypothesis: arxiv.org/abs/2512.19941 w/ @thomas_fel_ @RichieHakim @ABrondetta Demba Ba @t_andy_keller
GIF
108K
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Traveling waves of neural activity are observed all over the brain. Can they be used to augment neural networks? I am thrilled to share our new work, "Traveling Waves Integrate Spatial Information Through Time" with @t_andy_keller! 1/14
GIF
4.7K
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
For more details, check out our paper recently accepted in workshop form to the 2025 ICLR Re-Align workshop, as well as the full preprint! Paper: arxiv.org/abs/2502.06034 Code: github.com/KempnerInstitu… 13/14
823
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
A massive thank you to all those involved in this work: Lyle Muller, Roberto Budzinski, and Demba Ba! 14/14
617
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
The problem "Can One Hear the Shape of a Drum", posed by Mark Mac, is an example of spatial integration. Strike a drum, and its vibrations encode boundary shape. We can see (with fixed RNNs that simulate drums) that different sized drumheads have different dynamics: 4/14
GIF
GIF
174
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
We found that we could actually predict the area of the drums analytically by looking at the frequency of oscillations of each neuron (see below). This finding led us to wonder: can we actually learn (via trainable parameters) dynamics for more complex shapes? 5/14
156
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
Vision is a coordinated activity involving millions of neurons in the visual cortex. How is information shared over these large distances? Evidence suggests traveling waves could carry this information across space, allowing neurons to “know” what’s happening far away. 2/14
203
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
Check out our @KempnerInst blog post for audio on what different shapes sound like (to our models), as well as for more details and visualizations. kempnerinstitute.harvard.edu/research/deepe… 12/14
Traveling Waves Integrate Spatial Information Through Time - Kempner Institute
From kempnerinstitute.harvard.edu
120
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
We then studied both our wave-biased model and a standard ConvLSTM (no wave inductive bias). Incredibly, both models learned to generate waves. The ConvLSTM’s emergent waves (shown below on a Tetrominoes image) suggest a degree of optimality for wave-based solution. 7/14
GIF
137
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
Here are some examples of the wave dynamics used to segment Multi-MNIST images: 11/14
GIF
GIF
123
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
Spatial integration means that a neuron at one location can access signals from distant points. This could mean linking information together across an image to classify objects or linking words together in a sentence to derive meaning. 3/14
190
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
We built a trainable RNN (the Neural Wave Machine/NWM) that generates traveling waves in its hidden states. We began by testing it on segmenting polygons. We find that wave-based models produce unique dynamics for each shape, resulting in distinct Fourier spectra. 6/14
143
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
CNNs with small receptive fields (small # of layers) are unable to segment these images, while deeper models - with big receptive fields - are sometimes able to solve the task, but are more unstable yielding lower average performance and significantly higher variance. 9/14
121
Mozes Jacobs
@mozesjacobs
Mar 10, 2025
Replying to @mozesjacobs
We also compared our model to U-Nets, which have global receptive fields via skip connections and bottlenecks. Incredibly, on Multi-MNIST, wave-based models outperformed similarly sized U-Nets, despite having fewer parameters and only local connectivity. 10/14
130