dagtest

DAG-based integration test framework with durable state.

Installation

pip install dagtest

Quick Start

from dagtest import test

@test
async def test_root(ctx):
    return {"value": 1}

@test(depends_on=["test_root"])
async def test_child(ctx):
    parent_value = ctx.results["test_root"]["value"]
    assert parent_value == 1
    return {"computed": parent_value + 1}

dagtest run
dagtest list
dagtest graph

Visualization

Branch-Based Status Display

During execution, dagtest shows real-time branch status with hierarchical notation:

   Running root_step...
   PASS root_step (3ms)
[A•*]|A Running branch_a...
[A✓*]|A PASS branch_a (2ms)
[A•*]|B Running branch_b...
[A✓*]|B PASS branch_b (1ms)
[A•*]|C Running branch_c...
[A✓*]|C PASS branch_c (1ms)
[AC•]|A Running final_step...
[AC✓]|A PASS final_step (1ms)

Status Format: `[status_summary]|[last_component]`

First Bracket - Shows state at each depth level:

Letter (A, B, C...) = completed at this depth
• = currently running
✓ = successfully completed (shown on completion)
⚠ = failed but will retry (soft failure)
↻ = retrying after failure
* = pending/not yet started
✗ = hard failed (after all retries) or blocked by upstream failure

Second Bracket - Last component of branch path, colored by depth for visual differentiation

Example breakdown:

[A•*]|B → Root complete (A), this branch running (•), one pending level (*), branch identifier B
[AC•]|A → Root complete (A), parent branch complete (C), this node running (•), branch identifier A

Failure Propagation and Retries

Failed nodes propagate to downstream branches, and retries are shown with the retry icon:

[A•**]|A Running branch_a...
[A✓**]|A PASS branch_a (1ms)
[A•**]|B Running branch_b...
[A✗✗✗]|B FAIL branch_b (2ms)
       ValueError: Simulated failure

The [A✗✗✗]|B shows:

Root complete (A)
This branch hard failed (✗)
Two downstream levels blocked (✗✗)
Branch identifier B

Retry example:

[A•*]|A Running flaky_step...
[A⚠*]|A FAIL flaky_step (3ms)
       ValueError: First attempt fails
[A•*]|A Running flaky_step... (attempt 2)
[A✓*]|A PASS flaky_step (2ms)

[A⚠*]|A shows soft failure (will retry) - downstream nodes remain pending (*)
After successful retry, shows ✓

Hard failure (retries exhausted):

[A•*]|A Running always_fails...
[A⚠*]|A FAIL always_fails (3ms)
       ValueError: Always fails
         With multiple lines
         Of error output
[A•*]|A Running always_fails... (attempt 2)
[A✗✗]|A FAIL always_fails (2ms)
       ValueError: Always fails

First failure: [A⚠*]|A (soft, will retry)
Final failure: [A✗✗]|A (hard, retries exhausted)
Downstream nodes blocked: ✗✗ indicates two pending levels now blocked

DAG Export Formats

Export DAG diagrams for documentation and review:

# Terminal tree view (default)
dagtest graph

# Mermaid diagram for documentation
dagtest graph --format mermaid --output dag.mmd

# Preview before execution
dagtest run --dry-run

Mermaid output can be rendered in GitHub, GitLab, or documentation tools that support Mermaid.js syntax.

Works well with

Name	Use Case
DeepEval	LLM evaluation framework integrated with pytest; evaluate LLM outputs within dagtest workflows
GitHub Copilot	AI-powered test generation from natural language; accelerates writing dagtest test functions
gogcli	Command-line interface for Google Workspace services (Gmail, Calendar, Drive, etc.); automate email verification, calendar events, and document workflows in integration tests
Langfuse	Open-source LLM observability; trace and evaluate LLM calls within dagtest workflows, capturing multi-step execution history
Playwright	Browser Automation
Promptfoo	Lightweight prompt regression testing; validate individual prompts while dagtest orchestrates full workflow integration
Sweet Cookie	If only ever building for local usage, never implement login steps by extracting browser cookies

Alternatives

Integration Testing Tools

Tools directly comparable to dagtest for various integration testing scenarios.

Name	Category	Description	When to Use This Over dagtest
Playwright	Browser/E2E	Cross-browser automation with auto-wait and multi-language support	Browser UI testing; need screenshots/videos; cross-browser compatibility; parallel execution; modern web apps with SPAs
Cypress	Browser/E2E	Developer-centric browser testing with time-travel debugging	JavaScript-heavy apps; real-time reloading during test development; visual debugging; prefer all-in-one developer experience
Selenium	Browser/E2E	Industry-standard cross-browser automation with broad language support	Legacy browser support; existing Selenium infrastructure; mobile testing via Appium; enterprise compliance requirements
Testcontainers	Integration Testing	Docker containers for integration tests with real services	Testing requires real databases/services; need isolated environments per test run; no persistent state between tests
pytest-dependency	Test Dependency	Pytest plugin for test execution order and dependencies	Simple test ordering without durable state; existing pytest workflow; tests don't need to pass data between each other
pytest-workflow	Pipeline Testing	YAML-based testing for workflow systems (Nextflow, Snakemake, WDL)	Testing bioinformatics/scientific pipelines; prefer YAML over Python; validating CLI tool outputs and exit codes
Syrupy	Snapshot Testing	Zero-dependency pytest snapshot testing with idiomatic assertions	Testing complex output structures; want to detect any changes; prefer declarative assertions over manual validation
Pact	Contract Testing	Consumer-driven contract testing for microservices	Testing API contracts between services; microservice architecture; want to catch breaking changes before deployment
Postman	API Testing	GUI-based API testing with collaboration features	Manual/exploratory API testing; sharing collections with team; non-developers writing tests; quick prototyping
REST Assured	API Testing	Java library for RESTful API testing with fluent syntax	Java-based projects; prefer code-based API tests; CI/CD integration; comprehensive API validation with assertions
Percy	Visual Regression	Automated visual testing with cross-browser screenshot comparison	Detecting visual regressions; UI pixel-perfect requirements; cross-browser visual consistency; CI/CD visual validation
Firecrawl	AI Web Scraping	AI-powered web scraping API that converts sites to LLM-ready data	Web scraping for AI/LLM; need structured data from dynamic sites; avoiding anti-bot mechanisms; no selector maintenance

AI-Assisted Testing Tools

AI-powered testing platforms with autonomous agents, self-healing, and natural language test generation.

Name	Category	Description	When to Use This Over dagtest
ACCELQ	Codeless Automation	AI-powered codeless test automation with 9x faster creation and 88% maintenance reduction	Enterprise teams preferring no-code solutions; want rapid test creation; need self-healing without writing Python
Applitools	Visual AI Testing	Visual AI testing with cross-browser screenshot comparison (4.3⭐ Gartner)	Visual regression testing; UI pixel-perfect validation; need visual validation alongside integration tests
CodiumAI/Qodo	AI Test Generation	Open-source Meta TestGen-LLM implementation for automatic test suite generation	Want automatic test generation with high coverage; prefer AI to scaffold tests; don't need execution dependencies or durable state
Functionize	Enterprise AI Automation	Enterprise AI test automation with QA agents and ML-based stabilization	Enterprise QA teams; need managed AI testing platform; prefer vendor-supported solution over open-source
KaneAI	Agent-to-Agent Testing	Natural language test generation with agent-to-agent debugging and evolution	Small teams (1-10); prefer conversational test creation; low learning curve more important than DAG structure
Mabl	Autonomous Testing	AI agents autonomously create, maintain, and execute tests with self-healing	Need low-code automation; want 2x faster test creation; prefer GUI over code; don't need persistent state between test runs
pytest-evals	LLM Evaluation	Minimalistic pytest plugin tracking LLM accuracy and behavior	Testing LLM applications; want lightweight evaluation; prefer simple pytest integration over dagtest's DAG orchestration
Testim	ML-Based Stabilization	Tricentis ML-based test stabilization for complex web apps, leader in ease of use	Complex web applications; need ML stabilization; existing Tricentis ecosystem; prefer ease of use over code flexibility
Virtuoso QA	Autonomous Testing	Fully autonomous testing with natural language authoring and 90% maintenance reduction	Enterprise teams; need self-healing at scale; prefer fully managed autonomous testing over code-based DAG workflows

LLM Observability & Evaluation

Platforms for tracing, monitoring, and evaluating LLM applications in production. For evaluation methodology guidance, see docs/llm-evaluation-guide.md.

Name	Category	Description	When to Use This Over dagtest
Arize Phoenix	Observability	Open-source tracing with deep agent evaluation and multi-step trace analysis	Evaluating complex agent workflows; need structured evaluation of tool calls and reasoning steps
Braintrust	Evaluation	CI/CD-focused evaluation platform with experiment tracking	Want evals tied to CI/CD; need unified PM/engineer workflow; comparing prompt variations at scale
Galileo	Evaluation	Pre-built evaluators with custom LLM-as-judge generation	Need 20+ ready-made evaluators; want cheaper evaluation via specialized SLMs
Giskard	Evaluation	Bias, drift, and regression detection with collaborative feedback	Need bias/safety testing; want pre-deployment validation; require collaborative review workflows
Helicone	Observability	Rapid-implementation observability focused on cost optimization	Need quick setup; primary focus on cost/latency monitoring; want generous free tier
Langfuse	Observability	Open-source tracing, prompt management, and evaluation with self-hosting option	Need production LLM monitoring; want trace visualization; require prompt versioning; prefer open-source with data control
LangSmith	Observability	LangChain's observability platform with deep framework integration	Using LangChain/LangGraph; want seamless integration; need debugging views for chain internals
Maxim	Observability	Unified machine and human evaluation with CI/CD integration	Need human-in-the-loop evaluation; want unified quality framework; require production monitoring
Opik	Observability	Comet's open-source LLM evaluation and tracing platform	Want open-source alternative; need experiment tracking integration; prefer unified ML platform
Promptfoo	Evaluation	Lightweight YAML-based prompt testing with CLI integration	Prefer declarative test definitions; want CI/CD prompt regression; lightweight over full platforms

Workflow & Task Systems

DAG-based workflow engines and task runners that can be adapted for testing workflows.

Name	Category	Description	When to Use This Over dagtest
Snakemake	Scientific Workflow	Python-based workflow system with file-based targets and parallelization	Scientific/bioinformatics pipelines; file-based dependencies; cluster computing; prefer rule-based DSL over decorators
doit	Task Runner	Python task automation with DAG execution and result caching	Build automation; file-based targets; incremental builds; simpler task definitions; general automation beyond testing
LangGraph	LLM Orchestration	Graph-based workflow for LLM agents with state management	Building LLM agent workflows; need cyclic graphs; human-in-the-loop; conversation state management; LangSmith observability
Apache Airflow	Data Orchestration	DAG-based platform for data pipelines with web UI and scheduling	Production data orchestration; complex scheduling/monitoring; team collaboration via UI; existing Airflow infrastructure
Prefect	Data Orchestration	Modern workflow orchestration with cloud-hosted observability	Production workflows; cloud-native deployment; advanced retry/caching; prefer Python-native API; real-time monitoring
Luigi	Data Orchestration	Spotify's batch pipeline framework with target-based dependencies	Batch data processing; Hadoop/Spark jobs; central scheduler; prefer explicit targets over implicit state
Robot Framework	Test Automation	Keyword-driven test framework with natural language syntax	Acceptance testing; non-technical stakeholders; keyword-driven DSL; cross-platform GUI/API/mobile testing
Invoke	Task Runner	Python task execution tool with @task decorator and CLI invocation	Shell-oriented automation; prefer Python functions over YAML/DSL; tasks need parameters; simpler than full workflow engines

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dagtest		dagtest
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
IDEA.md		IDEA.md
IMPROVEMENTS.md		IMPROVEMENTS.md
PLAN.md		PLAN.md
README.md		README.md
notes-documentation.md		notes-documentation.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dagtest

Installation

Quick Start

Visualization

Branch-Based Status Display

Status Format: `[status_summary]|[last_component]`

Failure Propagation and Retries

DAG Export Formats

Works well with

Alternatives

Integration Testing Tools

AI-Assisted Testing Tools

LLM Observability & Evaluation

Workflow & Task Systems

About

Uh oh!

Releases

Packages

Uh oh!

Languages

KyleKing/dagtest

Folders and files

Latest commit

History

Repository files navigation

dagtest

Installation

Quick Start

Visualization

Branch-Based Status Display

Status Format: [status_summary]|[last_component]

Failure Propagation and Retries

DAG Export Formats

Works well with

Alternatives

Integration Testing Tools

AI-Assisted Testing Tools

LLM Observability & Evaluation

Workflow & Task Systems

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Status Format: `[status_summary]|[last_component]`

Packages