A nod to "Attention Is All You Need" (Vaswani et al., 2017)
Before Transformers, our models read text like humans do: word by word, left to right, holding onto context through a kind of sequential memory. It was slow and limiting.
The breakthrough was realizing we could let the