Hierarchical memory research harness for long-context and cross-episode LLM evaluation.
TYPHON is currently a structured research substrate, not yet a learned state-of-the-art memory model. The repo already supports:
- benchmark registration and local benchmark-pack ingestion
- a heuristic
typhon_v0memory-selection pipeline - a local exact baseline plus a WSL-backed Gated DeltaNet baseline
- model-backed evaluation through LM Studio and OpenAI-compatible servers
- reproducible configs, runbooks, and experiment tracking
- Documentation Index
- System Overview
- Technical PM Brief
- Experiment Matrix
- Benchmark Packs Runbook
- WSL Runtime Runbook
src/typhon: package code, CLI, benchmarks, baselines, memory logic, inference, evaluationconfigs: benchmark, runtime, baseline, and live-eval configurationdata: benchmark packs, imports, and normalized local sample assetsresults: generated artifacts and evaluation outputsscripts: wrapper scripts, mostly evaluation and WSL runtime helpersdocs: architecture docs, ADRs, project state, runbooks, research notes, archivethird_party: isolated external code or cloned reposnotebooks: exploratory analysis only
uv run typhon list-benchmarks
uv run typhon list-baselines
uv run typhon validate-benchmark-pack --benchmark longbench
uv run typhon evaluate-memory-suite --backend lmstudio_local --model qwen3.5-9b-vlm --benchmark longbench_v2 --benchmark locomo --benchmark locomo_plus --benchmark memorybench --benchmark evo_memory --sample-source local --chunk-size 24 --local-window-tokens 24 --request-timeout-seconds 600
uv run typhon run-baseline --baseline gated_deltanet_fla --benchmark longbench --sample-source local --sample-limit 1- Use
uv run ...for repo commands anduv lockafter dependency changes. - Keep major architectural or process decisions in ADRs, not in ad hoc notes.
- Put new docs into the structured docs tree. Do not add new flat markdown files at the repo root or
docs/root unless they are indexes. - Treat
results/as generated output, not hand-maintained source material.
The repo has one upstream benchmark adapter live today:
LongBenchEnglish-task import via Hugging Face into manifest-backed local packs
The current implemented baselines are:
attention_baselinegated_deltanet_fla
The current primary live model path is:
lmstudio_localwithqwen3.5-9b-vlm
The canonical project state trackers are: