A Transaction Science Open Standard

How ready is an AI system?
Measure it the way you measure a technology.

ARL is a universal, vendor-neutral readiness scale — what the Technology Readiness Level scale is to technology, for AI. It is tied to no model, runtime, or vendor. A score means the same thing everywhere because each axis is anchored in math or physics that does not drift across time, languages, or political regimes.

Required axes

Readiness levels

Tests passing

Unmeasurable terms

Validate a claim Read the spec

Four required axes

None summarizes the others. They cover what the system is, what it does, what it costs, and how it holds up under attack.

1 · 1–9

Validation Depth

How thoroughly the readiness claim has been tested — from principle observed (1) to a proven, publicly-disclosed track record across diverse contexts (9). Adapts the Technology Readiness Level scale.

statistics

2 · A–E

Convergence Class

How stochastic the system is on the certified task. A is deterministic-equivalent across ≥100 runs; E is uncharacterized — the default until variance is measured.

stochastic process theory

3 · joules

Energy Profile

Training amortized, per-task inference, and total cost of operation — all in joules, with PUE and grid carbon. Refusing to disclose caps the achievable score at ARL 3.

thermodynamics

4 · S0–S4

Security Class

Measured adversarial robustness, output integrity (signed + content-addressable), input/state confidentiality, and auditability — not generic, unenumerated “AI safety.”

information theory + cryptography

A score is assigned to a specific system + task + context on specific hardware. Change any of them and you score again. Hardware is documented alongside every claim for reproducibility — it is not a fifth axis.

The cross-axis gates

The teeth. A high readiness claim is unreachable without matching convergence and security — and silence has a price.

ARL ≥ 4 requires Convergence D+ and Security S1

ARL ≥ 6 requires Convergence C+ and Security S2

ARL ≥ 8 requires Convergence B+ and Security S3

ARL = 9 requires Security S4

energy undisclosed → score capped at ARL 3

security methodology undisclosed → class capped at S0

ARL ≥ 4 requires published error bars + a failure-mode catalog

ARL ≥ 6 methodology must be published before the claim

A claim missing any of the four parts is incomplete by definition. Terms with no single operational definition — AGI, superintelligence, consciousness, sentience, human-level — can't anchor a claim, because they can't be measured; ARL takes no position on the terms themselves. The playground enforces all of this in your browser, running the exact reference checker compiled to WebAssembly.

Documents

ARL

The four-axis readiness scoring framework.

ARL-S

The sandbox: the testing environment, tiers, telemetry, attestation, and replay.

Lexicon

The controlled vocabulary, so a stated score has one meaning.

Specification text is CC-BY-4.0. ARL is owned by no one — Transaction Science is one steward. Source on GitHub →

Reference implementation

Four Apache-2.0 Rust crates. The standard is what they enforce.

arl-core the claim model

The four-axis Claim type, the cross-axis gates, and the deterministic verifier. The same library the playground compiles to WebAssembly.

arl-sandbox the supervisor

Orchestrates a session — measures convergence, energy, security signals — and signs the result with Ed25519 / JCS so the score is content-addressable.

arl-cli the checker

Four verbs: validate (gate a claim), lint (vocabulary), verify (signed session), explain (why a score capped where it did).

arl-wasm the browser binding

arl-core compiled to WebAssembly. The /playground page runs the exact reference checker locally — claims are never uploaded.

One workspace, no model dependency. The CLI's verify reads an arl-sandbox session bundle and confirms the Ed25519 attestation against the published public key — third parties replay the score without trusting the issuer.

How ready is an AI system? Measure it the way you measure a technology.