Skip to main content

r/LanguageTechnology

members
online


Taffy’s osteoarthritis (OA) pain used to get in the way of doing what she loves most with her family. Now, once-monthly Librela (bedinvetmab injection) has Taffy’s OA pain under control. See more of Taffy’s real results with Librela at Librela.com.

Please see Prescribing Information for Librela.

media poster



Wave Field LLM — O(n log n) attention via wave equation dynamics Wave Field LLM — O(n log n) attention via wave equation dynamics

I've been working on an alternative attention mechanism that treats language as a physical field system instead of using standard O(n²) self-attention.

How it works:

  • Tokens are mapped onto a continuous 1D field

  • Information propagates via damped wave equations: k(t) = exp(-α·t)·cos(ω·t + φ)

  • Each attention head has just 3 learnable physics parameters (frequency, damping, phase)

  • Convolution computed via FFT in O(n log n)

  • Heads self-organize into different roles (local grammar, medium context, long-range)

Results (WikiText-2, 6M params, character tokenizer):

Model PPL Accuracy Complexity
Standard Transformer 5.9 51.0% O(n²)
Wave Field V3.5 6.2 50.5% O(n log n)

At longer sequences the savings grow: 31x at 2K tokens, 107x at 8K, 367x at 32K.

Known limitations:

  • With BPE tokenizer (8K vocab), there's a significant capacity gap vs standard transformer

  • This is a model capacity issue at small scale, not an architecture flaw

  • Currently scaling to 100M params to see if the gap closes

What's unique:

  • Every bug during development was found through physics-based diagnostics (energy flow, conservation, causality tests) — not guessing

  • Cross-head field coupling and wave interference for information routing

  • Not a Mamba/Hyena variant — different approach entirely

Code: https://github.com/badaramoni/wave-field-llm

Happy to answer questions about the physics, architecture decisions, or results.