NERFSTATUS

IS YOUR LLM
GETTING WORSE?

Scientific degradation detection. 17 research-backed probes.
One CLI command.

$pip install nerfprobe
$nerfprobe run gpt-4o --tier core
"General benchmarks miss silent degradation. Models get quantized, fine-tuned, or simply drift. We detect the rot."

17 Research-Backed Probes

Organized into three tiers based on signal-to-noise ratio and cost.

CORE TIEREssential signal. Low cost.

Math

arXiv:2504.04823

Detects arithmetic accuracy drift in complex reasoning chains.

Style

arXiv:2403.06408

Measures vocabulary collapse and TTR (Type-Token Ratio).

Timing

Internal

Tracks time-to-first-token (TTFT) and generation latency.

Code

EvalPlus

Syntax validity and logic checks for Python/Rust generation.

Fact

arXiv:2512.08213

Measures hallucination rate on verifiable world knowledge.

ADVANCED TIERStructural integrity and complex failures.

JSON

arXiv:2402.16775

Structured output adherence and schema validation failures.

Consistency

arXiv:2305.xxxx

Self-consistency across multiple generations of same prompt.

Fingerprint

arXiv:2407.15847

Identifies specific quantization artifacts and model signatures.

Context

Standard

Needle-in-haystack retrieval and context window adherence.

Routing

Internal

Tool selection and classification accuracy.

Logic

LogiQA

First-order logic and syllogistic reasoning capabilities.

OPTIONAL TIERExperimental.

Zeroprint

Experimental

Advanced zero-shot capability fingerprinting.

Multilingual

MMLU-Multi

Cross-lingual reasoning and translation fidelity.

Integrate in Minutes

# 1. Install
pip install nerfprobe

# 2. Run core probes on GPT-4o
nerfprobe run gpt-4o --tier core

# 3. Output as JSON for CI/CD
nerfprobe run claude-3-opus --probe math --format json
import
asyncio
from
nerfprobe
import
run_probes, OpenAIGateway

gateway = OpenAIGateway(api_key="...")
results = await run_probes("gpt-4o", gateway, tier="core")

# Simple programmatic access
print(results[0].score) # 0.98

Try the interactive demo on HuggingFace Spaces.

🤗 Open Space
(View on desktop for architecture diagram)