What My Project Does
Booktest is a Python tool for review-driven regression testing of ML/NLP/LLM systems. Instead of relying only on assertion-based pass/fail tests, it captures outputs as readable artifacts and focuses on reviewable diffs between runs.
It also supports incremental pipelines and caching, so expensive steps don’t need to rerun unnecessarily, which makes it practical for CI workflows involving model inference.
Target Audience
This is intended for developers and ML engineers working with systems where outputs don’t have a single “correct” value (e.g., NLP pipelines, LLM-based systems, ranking/search models).
It’s designed for production workflows, but can also be useful in experimental or research settings.
Comparison
Traditional testing tools like pytest or snapshot-based tests work well when outputs are deterministic and correctness is objective.
Booktest complements those tools in cases where correctness is fuzzy and regressions need to be reviewed rather than strictly asserted.
It’s not meant to replace pytest, but to handle cases where binary assertions are insufficient.
Repo:
I’m the author and I'd love to hear your thoughts and perspectives, especially around pytest/CI integration patterns. :-)