IS YOUR LLM
GETTING WORSE?

Scientific degradation detection. 17 research-backed probes.
One CLI command.

$pip install nerfprobe

$nerfprobe run gpt-4o --tier core

View on PyPI GitHub Repository

"General benchmarks miss silent degradation. Models get quantized, fine-tuned, or simply drift. We detect the rot."

17 Research-Backed Probes

Organized into three tiers based on signal-to-noise ratio and cost.

CORE TIEREssential signal. Low cost.

Math

arXiv:2504.04823

Detects arithmetic accuracy drift in complex reasoning chains.

Style

arXiv:2403.06408

Measures vocabulary collapse and TTR (Type-Token Ratio).

Timing

Internal

Tracks time-to-first-token (TTFT) and generation latency.

Code

EvalPlus

Syntax validity and logic checks for Python/Rust generation.

Fact

arXiv:2512.08213

Measures hallucination rate on verifiable world knowledge.

ADVANCED TIERStructural integrity and complex failures.

JSON

arXiv:2402.16775

Structured output adherence and schema validation failures.

Consistency

arXiv:2305.xxxx

Self-consistency across multiple generations of same prompt.

Fingerprint

arXiv:2407.15847

Identifies specific quantization artifacts and model signatures.

Context

Standard

Needle-in-haystack retrieval and context window adherence.

Routing

Internal

Tool selection and classification accuracy.

Logic

LogiQA

First-order logic and syllogistic reasoning capabilities.

OPTIONAL TIERExperimental.

Zeroprint

Experimental

Advanced zero-shot capability fingerprinting.

Multilingual

MMLU-Multi

Cross-lingual reasoning and translation fidelity.

Integrate in Minutes

# 1. Install

pip install nerfprobe

# 2. Run core probes on GPT-4o

nerfprobe run gpt-4o --tier core

# 3. Output as JSON for CI/CD

nerfprobe run claude-3-opus --probe math --format json

import

asyncio

from

nerfprobe

import

run_probes, OpenAIGateway

gateway = OpenAIGateway(api_key="...")

results = await run_probes("gpt-4o", gateway, tier="core")

# Simple programmatic access

print(results[0].score) # 0.98

Try the interactive demo on HuggingFace Spaces.

🤗 Open Space

┌──────────────────────────────────────────────┐
│               nerfprobe (CLI)                │
├──────────────────────────────────────────────┤
│           nerfprobe-core (Library)           │
│  ┌───────────┬──────────────┬─────────────┐  │
│  │ Core      │ Advanced     │ Optional    │  │
│  │ 5 probes  │ 10 probes    │ 2 probes    │  │
│  └───────────┴──────────────┴─────────────┘  │
├──────────────────────────────────────────────┤
│           Gateways (7 providers)             │
└──────────────────────────────────────────────┘

(View on desktop for architecture diagram)

IS YOUR LLM GETTING WORSE?