Snorkel helps build Terminal-Bench 2.0. Learn more
The Data Research Lab advancing frontier AI
Where academic rigor meets production—design and pressure test the datasets and evaluations that make AI models and agents work in the real world.
Proud to partner with top frontier AI and research teams

Data and evaluation for real-world AI
Operationalize the full AI data loop—from dataset curation and realistic simulations to rubric design and evals. Snorkel provides end-to-end solutions that advance frontier AI and agentic systems.
Expert data services
Curate high-quality, domain-specific datasets to accelerate your AI use cases and performance.
Applied AI solutions
Design and co-develop specialized models, evaluation frameworks, and data pipelines for your organization.
Research-led development
Programmatic quality control
Expert-in-the-loop acceleration
AI stalls without a data development engine
Most AI teams iterate on prompts and parameters while the data and evaluation loop is ad hoc. The result: gains that don’t generalize, slow fixes, and no way to prove lift.
Your AI in production
Shifting targets
Edge cases
Uneven quality
One-off evals
Tool sprawl
74%
hallucination
Unknown
coverage
Not
reproducible
Close the loop on AI data
Snorkel's AI data development platform is a unified engine to design, stress-test, evaluate, and improve the data powering your frontier models and agent behavior.
Planning
Define tasks, IO contracts, and scoring rubrics; select verifiers and preference signals to set what “good” looks like.
Execution
Run rubric-guided task and labeling pipelines with precise inputs/outputs, automated checks, and calibrated expert review.
Refinement
Analyze failures and disagreement, update rubrics, and target data collection to close coverage gaps for the next cycle.
Evaluate
Measure behavior with terminal-grade coding tasks and realistic simulations; publish reproducible results and traces.
The expert-in-the-loop difference
Snorkel pairs programmatic automation with calibrated experts-in-the-loop. Using rubrics, verifiers, and review loops, we help AI teams curate high-quality datasets 2× faster without sacrificing volume or precision.
Meta-evaluation
Evaluator
Development
Model-based & Rule-based
evalutaion
Expert Correction & Feedback
1,000+ expert-level topics
High-precision data development for the challenges and tasks generalist workflows can't address.

Featured research from the lab
Benchmarks, datasets, and papers backed by our team of applied AI researchers and data scientists from Stanford, MIT, and UC Berkeley. With 100+ peer-reviewed publications in foundation models and data-centric AI, we bring results you can audit, reproduce, and build on.
