Pointblank is a data validation toolkit for Python that enables data quality assessment across multiple backends including Polars, Pandas, DuckDB, MySQL, PostgreSQL, SQLite, Parquet files, PySpark, and Snowflake. This document provides a high-level architectural overview of the system, covering the major components, their interactions, and the primary workflows.
For detailed information on specific subsystems:
The Pointblank system consists of several interconnected layers that work together to provide a complete data validation workflow:
High-Level System Components
Sources: pointblank/validate.py1-154 pointblank/_interrogation.py1-100 pointblank/thresholds.py1-50 pointblank/column.py1-50 README.md1-100
The fundamental Pointblank workflow follows a consistent pattern across all interfaces:
Validation Lifecycle with Code References
Sources: pointblank/validate.py1800-2100 pointblank/_interrogation.py1-500 pointblank/thresholds.py1-200 README.md98-189
Pointblank provides three primary interfaces for data validation, each serving different use cases:
| Interface | Entry Point | Primary Use Case | Exit Code Support |
|---|---|---|---|
| Python API | pb.Validate(data=...) | Interactive notebooks, scripts, data pipelines | No |
| YAML Configuration | pb.yaml_interrogate(file) | Version control, team collaboration, portability | Via --exit-code flag |
| Command Line | pb validate, pb run, pb scan | CI/CD pipelines, quick checks, automation | Yes (--exit-code flag) |
Python API Pattern:
YAML Configuration Pattern:
CLI Pattern:
Sources: pointblank/validate.py139-154 README.md215-256 README.md257-313
The validation engine is built around several key classes and modules:
Core Validation Classes and Their Responsibilities
Sources: pointblank/validate.py1728-2500 pointblank/_interrogation.py1-1500 pointblank/_agg.py1-200 pointblank/schema.py1-300
Pointblank achieves backend-agnostic validation through a two-layer abstraction strategy:
Backend Integration Architecture
Supported Backend Matrix:
| Backend | Detection String | Abstraction Layer | Key File References |
|---|---|---|---|
| Polars | "polars" | Narwhals | validate.py6500-6600 |
| Pandas | "pandas" | Narwhals | validate.py6500-6600 |
| DuckDB | "duckdb" | Ibis → Narwhals | validate.py6900-7000 |
| PostgreSQL | "postgres" | Ibis → Narwhals | validate.py6900-7000 |
| MySQL | "mysql" | Ibis → Narwhals | validate.py6900-7000 |
| SQLite | "sqlite" | Ibis → Narwhals | validate.py6900-7000 |
| PySpark | "pyspark" | Narwhals | validate.py6500-6600 |
| Parquet | "parquet" | Ibis → Narwhals | validate.py6900-7000 |
| Snowflake | "snowflake" | Ibis → Narwhals | validate.py6900-7000 |
Sources: pointblank/validate.py6400-7000 pointblank/_utils.py1-500 pointblank/_constants.py99-110
Pointblank provides four primary configuration systems that control validation behavior:
Configuration System Components
Sources: pointblank/thresholds.py1-650 pointblank/column.py1-500 pointblank/segments.py1-300 pointblank/validate.py173-376
Pointblank provides multiple ways to inspect validation results and analyze data quality:
Reporting System Architecture
Report Method Summary:
| Method | Return Type | Purpose | Typical Use Case |
|---|---|---|---|
get_tabular_report() | GT | Full validation summary | Stakeholder communication |
get_step_report(i) | GT | Single step detail | Debugging failures |
get_data_extracts(i) | DataFrame | Failing rows CSV | Error investigation |
get_sundered_data(type) | DataFrame | Pass/fail data split | Data filtering |
get_json_report() | dict | Programmatic access | API integration |
Sources: pointblank/validate.py3500-5400 pointblank/validate.py7500-8400 README.md119-123
The codebase is organized into functional modules:
| Module | Primary Classes/Functions | Responsibilities |
|---|---|---|
validate.py | Validate, _ValidationInfo, read_file(), write_file() | Core validation orchestration, serialization |
_interrogation.py | interrogate_*() functions, ConjointlyValidation, SpeciallyValidation | Validation execution logic |
thresholds.py | Thresholds, Actions, FinalActions | Quality control configuration |
column.py | Column, ColumnSelector, helper functions | Column targeting system |
segments.py | Segment, seg_group() | Data partitioning |
schema.py | Schema, _get_schema_validation_info() | Schema validation |
_utils.py | _get_tbl_type(), _process_data(), _resolve_columns() | Type detection, data processing |
_agg.py | load_validation_method_grid(), resolve_agg_registries() | Aggregate method generation |
_constants.py | Constants, mappings, configurations | System-wide definitions |
_constants_translations.py | Multilingual text | Internationalization |
Sources: pointblank/validate.py1-100 pointblank/_interrogation.py1-50 pointblank/thresholds.py1-50 pointblank/column.py1-50 pointblank/_utils.py1-50 pointblank/_agg.py1-50
Pointblank includes several advanced capabilities for specialized use cases:
Advanced Feature Components
Key Advanced Patterns:
DraftValidation(data=df, model="anthropic:claude-sonnet-4") generates intelligent validation planscol_vals_expr(expr=pl.col('a') > pl.col('b')) for arbitrary comparisonsconjointly(lambda df: expr1, lambda df: expr2) for multi-condition validationvalidation.write_file("validation.pkl") and pb.read_file("validation.pkl") for reusable validationsSources: pointblank/validate.py627-2400 pointblank/_interrogation.py1500-1900 pointblank/column.py400-600 pointblank/_constants.py139-144
Refresh this wiki
This wiki was recently refreshed. Please wait 3 days to refresh again.