Log inSign up
Mozes Jacobs
39 posts
Image
user avatar
Mozes Jacobs
@mozesjacobs
PhD student @KempnerInst @Harvard. Fellow @GoodfireAI Representation Learning and Interpretability. Advised by Demba Ba.
mozesjacobs.github.io
Joined February 2025
59
Following
832
Followers
  • Pinned
    user avatar
    Mozes Jacobs
    @mozesjacobs
    Feb 10
    Are ViTs secretly RNNs? #ICLR2026 Our 2-block recurrent transformer recovers 96% of DINOv2’s IN-1k accuracy & reproduces its activations 1-to-1, motivating the Block-Recurrent Hypothesis: arxiv.org/abs/2512.19941 w/ @thomas_fel_ @RichieHakim @ABrondetta Demba Ba @t_andy_keller
    Image
    GIF
    108K
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Traveling waves of neural activity are observed all over the brain. Can they be used to augment neural networks? I am thrilled to share our new work, "Traveling Waves Integrate Spatial Information Through Time" with @t_andy_keller! 1/14
    Image
    GIF
    4.7K
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    For more details, check out our paper recently accepted in workshop form to the 2025 ICLR Re-Align workshop, as well as the full preprint! Paper: arxiv.org/abs/2502.06034 Code: github.com/KempnerInstitu… 13/14
    823
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    A massive thank you to all those involved in this work: Lyle Muller, Roberto Budzinski, and Demba Ba! 14/14
    617
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    The problem "Can One Hear the Shape of a Drum", posed by Mark Mac, is an example of spatial integration. Strike a drum, and its vibrations encode boundary shape. We can see (with fixed RNNs that simulate drums) that different sized drumheads have different dynamics: 4/14
    Image
    GIF
    Image
    GIF
    174
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    We found that we could actually predict the area of the drums analytically by looking at the frequency of oscillations of each neuron (see below). This finding led us to wonder: can we actually learn (via trainable parameters) dynamics for more complex shapes? 5/14
    Image
    156
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    Vision is a coordinated activity involving millions of neurons in the visual cortex. How is information shared over these large distances? Evidence suggests traveling waves could carry this information across space, allowing neurons to “know” what’s happening far away. 2/14
    Image
    203
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    Check out our @KempnerInst blog post for audio on what different shapes sound like (to our models), as well as for more details and visualizations. kempnerinstitute.harvard.edu/research/deepe… 12/14
    Image
    Traveling Waves Integrate Spatial Information Through Time - Kempner Institute
    From kempnerinstitute.harvard.edu
    120
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    We then studied both our wave-biased model and a standard ConvLSTM (no wave inductive bias). Incredibly, both models learned to generate waves. The ConvLSTM’s emergent waves (shown below on a Tetrominoes image) suggest a degree of optimality for wave-based solution. 7/14
    Image
    GIF
    137
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    Here are some examples of the wave dynamics used to segment Multi-MNIST images: 11/14
    Image
    GIF
    Image
    GIF
    123
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    Spatial integration means that a neuron at one location can access signals from distant points. This could mean linking information together across an image to classify objects or linking words together in a sentence to derive meaning. 3/14
    Image
    190
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    We built a trainable RNN (the Neural Wave Machine/NWM) that generates traveling waves in its hidden states. We began by testing it on segmenting polygons. We find that wave-based models produce unique dynamics for each shape, resulting in distinct Fourier spectra. 6/14
    Image
    143
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    CNNs with small receptive fields (small # of layers) are unable to segment these images, while deeper models - with big receptive fields - are sometimes able to solve the task, but are more unstable yielding lower average performance and significantly higher variance. 9/14
    121
  • user avatar
    Mozes Jacobs
    @mozesjacobs
    Mar 10, 2025
    Replying to @mozesjacobs
    We also compared our model to U-Nets, which have global receptive fields via skip connections and bottlenecks. Incredibly, on Multi-MNIST, wave-based models outperformed similarly sized U-Nets, despite having fewer parameters and only local connectivity. 10/14
    Image
    130

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement