What My Project Does
I built a small experimental Python tool called doubt that helps diagnose how functions behave when parts of their inputs are missing. I encountered this issue in my day to day data science work. We always wanted to know how a piece of code/function will behave in case of missing data(NaN usually) e.g. a function to calculate average of values in a list. Think of any business KPi which gets affected by missing data.
The tool works by:
-
injecting missing values (e.g.
None,NaN,pd.NA) into function inputs one at a time -
re-running the function against a baseline execution
-
classifying the outcome as:
-
crash
-
silent output change
-
type change
-
no impact
-
The intent is not to replace unit tests, but to act as a diagnostic lens to identify where functions make implicit assumptions about data completeness and where defensive checks or validation might be needed.
Target Audience
This is primarily aimed at:
-
developers working with data pipelines, analytics, or ETL code
-
people dealing with real-world, messy data where missingness is common
-
early-stage debugging and code hardening rather than production enforcement
It’s currently best suited for relatively pure or low-side-effect functions and small to medium inputs.
The project is early-stage and experimental, and not yet intended as a drop-in production dependency.
Comparison
Compared to existing approaches:
-
Unit tests require you to anticipate missing-data cases in advance;
doubtexplores missingness sensitivity automatically. -
Property-based testing (e.g. Hypothesis) can generate missing values, but requires explicit strategy and property definitions;
doubtfocuses specifically on mapping missing-input impact without needing formal invariants. -
Fuzzing / mutation testing typically perturbs code or arbitrary inputs, whereas
doubtis narrowly scoped to data missingness, which is a common real-world failure mode in data-heavy systems.
Example
from doubt import doubt
@doubt()
def total(values):
return sum(values)
total.check([1, 2, 3])Installation
The package is not on PyPI yet. Install directly from GitHub:
pip install git+
Repository:
This is an early prototype and I’m mainly looking for feedback on:
-
practical usefulness
-
noise / false positives
-
where this fits (or doesn’t) alongside existing testing approaches