Pretraining Recurrent Networks without Recurrence

Image

Akarsh Kumar, Phillip Isola

Massachusetts Institute of Technology. Preprint 2026.


teaser figure

TLDR

We propose Supervised Memory Training (SMT), a replacement for BPTT for training nonlinear RNNs. SMT trains a time-parallel encoder to produce 'optimal' memory states: compressed representations of the past that are predictive of the future. The RNN is trained with one-step supervised learning to mimic transitions between these optimal memory states.

Key Results

SMT achieves:

Applications

Citation

@article{kumar2026smt,
  title     = {Pretraining Recurrent Networks without Recurrence},
  author    = {Akarsh Kumar and Phillip Isola},
  year      = {2026},
  url       = {https://arxiv.org/abs/2606.06479},
  note      = {Project page: \url{https://akarshkumar.com/smt}},
}

Hit Counter by Digits