How ready is an AI system?
Measure it the way you measure a technology.
ARL is a universal, vendor-neutral readiness scale — what the Technology Readiness Level scale is to technology, for AI. It is tied to no model, runtime, or vendor. A score means the same thing everywhere because each axis is anchored in math or physics that does not drift across time, languages, or political regimes.
Four required axes
None summarizes the others. They cover what the system is, what it does, what it costs, and how it holds up under attack.
Validation Depth
How thoroughly the readiness claim has been tested — from principle observed (1) to a proven, publicly-disclosed track record across diverse contexts (9). Adapts the Technology Readiness Level scale.
statisticsConvergence Class
How stochastic the system is on the certified task. A is deterministic-equivalent across ≥100 runs; E is uncharacterized — the default until variance is measured.
stochastic process theoryEnergy Profile
Training amortized, per-task inference, and total cost of operation — all in joules, with PUE and grid carbon. Refusing to disclose caps the achievable score at ARL 3.
thermodynamicsSecurity Class
Measured adversarial robustness, output integrity (signed + content-addressable), input/state confidentiality, and auditability — not generic, unenumerated “AI safety.”
information theory + cryptographyA score is assigned to a specific system + task + context on specific hardware. Change any of them and you score again. Hardware is documented alongside every claim for reproducibility — it is not a fifth axis.
The cross-axis gates
The teeth. A high readiness claim is unreachable without matching convergence and security — and silence has a price.
A claim missing any of the four parts is incomplete by definition. Terms with no single operational definition — AGI, superintelligence, consciousness, sentience, human-level — can't anchor a claim, because they can't be measured; ARL takes no position on the terms themselves. The playground enforces all of this in your browser, running the exact reference checker compiled to WebAssembly.
Documents
ARL
The four-axis readiness scoring framework.
ARL-S
The sandbox: the testing environment, tiers, telemetry, attestation, and replay.
Lexicon
The controlled vocabulary, so a stated score has one meaning.
Specification text is CC-BY-4.0. ARL is owned by no one — Transaction Science is one steward. Source on GitHub →
Reference implementation
Four Apache-2.0 Rust crates. The standard is what they enforce.
The four-axis Claim type, the cross-axis gates, and the deterministic verifier. The same library the playground compiles to WebAssembly.
Orchestrates a session — measures convergence, energy, security signals — and signs the result with Ed25519 / JCS so the score is content-addressable.
Four verbs: validate (gate a claim), lint (vocabulary), verify (signed session), explain (why a score capped where it did).
arl-core compiled to WebAssembly. The /playground page runs the exact reference checker locally — claims are never uploaded.
One workspace, no model dependency. The CLI's verify reads an arl-sandbox session bundle and confirms the Ed25519 attestation against the published public key — third parties replay the score without trusting the issuer.