DAG-based integration test framework with durable state.
pip install dagtestfrom dagtest import test
@test
async def test_root(ctx):
return {"value": 1}
@test(depends_on=["test_root"])
async def test_child(ctx):
parent_value = ctx.results["test_root"]["value"]
assert parent_value == 1
return {"computed": parent_value + 1}dagtest run
dagtest list
dagtest graphDuring execution, dagtest shows real-time branch status with hierarchical notation:
Running root_step...
PASS root_step (3ms)
[A•*]|A Running branch_a...
[A✓*]|A PASS branch_a (2ms)
[A•*]|B Running branch_b...
[A✓*]|B PASS branch_b (1ms)
[A•*]|C Running branch_c...
[A✓*]|C PASS branch_c (1ms)
[AC•]|A Running final_step...
[AC✓]|A PASS final_step (1ms)
First Bracket - Shows state at each depth level:
- Letter (A, B, C...) = completed at this depth
•= currently running✓= successfully completed (shown on completion)⚠= failed but will retry (soft failure)↻= retrying after failure*= pending/not yet started✗= hard failed (after all retries) or blocked by upstream failure
Second Bracket - Last component of branch path, colored by depth for visual differentiation
Example breakdown:
[A•*]|B→ Root complete (A), this branch running (•), one pending level (*), branch identifier B[AC•]|A→ Root complete (A), parent branch complete (C), this node running (•), branch identifier A
Failed nodes propagate to downstream branches, and retries are shown with the retry icon:
[A•**]|A Running branch_a...
[A✓**]|A PASS branch_a (1ms)
[A•**]|B Running branch_b...
[A✗✗✗]|B FAIL branch_b (2ms)
ValueError: Simulated failure
The [A✗✗✗]|B shows:
- Root complete (A)
- This branch hard failed (✗)
- Two downstream levels blocked (✗✗)
- Branch identifier B
Retry example:
[A•*]|A Running flaky_step...
[A⚠*]|A FAIL flaky_step (3ms)
ValueError: First attempt fails
[A•*]|A Running flaky_step... (attempt 2)
[A✓*]|A PASS flaky_step (2ms)
[A⚠*]|Ashows soft failure (will retry) - downstream nodes remain pending (*)- After successful retry, shows ✓
Hard failure (retries exhausted):
[A•*]|A Running always_fails...
[A⚠*]|A FAIL always_fails (3ms)
ValueError: Always fails
With multiple lines
Of error output
[A•*]|A Running always_fails... (attempt 2)
[A✗✗]|A FAIL always_fails (2ms)
ValueError: Always fails
- First failure:
[A⚠*]|A(soft, will retry) - Final failure:
[A✗✗]|A(hard, retries exhausted) - Downstream nodes blocked:
✗✗indicates two pending levels now blocked
Export DAG diagrams for documentation and review:
# Terminal tree view (default)
dagtest graph
# Mermaid diagram for documentation
dagtest graph --format mermaid --output dag.mmd
# Preview before execution
dagtest run --dry-runMermaid output can be rendered in GitHub, GitLab, or documentation tools that support Mermaid.js syntax.
| Name | Use Case |
|---|---|
| DeepEval | LLM evaluation framework integrated with pytest; evaluate LLM outputs within dagtest workflows |
| GitHub Copilot | AI-powered test generation from natural language; accelerates writing dagtest test functions |
| gogcli | Command-line interface for Google Workspace services (Gmail, Calendar, Drive, etc.); automate email verification, calendar events, and document workflows in integration tests |
| Langfuse | Open-source LLM observability; trace and evaluate LLM calls within dagtest workflows, capturing multi-step execution history |
| Playwright | Browser Automation |
| Promptfoo | Lightweight prompt regression testing; validate individual prompts while dagtest orchestrates full workflow integration |
| Sweet Cookie | If only ever building for local usage, never implement login steps by extracting browser cookies |
Tools directly comparable to dagtest for various integration testing scenarios.
| Name | Category | Description | When to Use This Over dagtest |
|---|---|---|---|
| Playwright | Browser/E2E | Cross-browser automation with auto-wait and multi-language support | Browser UI testing; need screenshots/videos; cross-browser compatibility; parallel execution; modern web apps with SPAs |
| Cypress | Browser/E2E | Developer-centric browser testing with time-travel debugging | JavaScript-heavy apps; real-time reloading during test development; visual debugging; prefer all-in-one developer experience |
| Selenium | Browser/E2E | Industry-standard cross-browser automation with broad language support | Legacy browser support; existing Selenium infrastructure; mobile testing via Appium; enterprise compliance requirements |
| Testcontainers | Integration Testing | Docker containers for integration tests with real services | Testing requires real databases/services; need isolated environments per test run; no persistent state between tests |
| pytest-dependency | Test Dependency | Pytest plugin for test execution order and dependencies | Simple test ordering without durable state; existing pytest workflow; tests don't need to pass data between each other |
| pytest-workflow | Pipeline Testing | YAML-based testing for workflow systems (Nextflow, Snakemake, WDL) | Testing bioinformatics/scientific pipelines; prefer YAML over Python; validating CLI tool outputs and exit codes |
| Syrupy | Snapshot Testing | Zero-dependency pytest snapshot testing with idiomatic assertions | Testing complex output structures; want to detect any changes; prefer declarative assertions over manual validation |
| Pact | Contract Testing | Consumer-driven contract testing for microservices | Testing API contracts between services; microservice architecture; want to catch breaking changes before deployment |
| Postman | API Testing | GUI-based API testing with collaboration features | Manual/exploratory API testing; sharing collections with team; non-developers writing tests; quick prototyping |
| REST Assured | API Testing | Java library for RESTful API testing with fluent syntax | Java-based projects; prefer code-based API tests; CI/CD integration; comprehensive API validation with assertions |
| Percy | Visual Regression | Automated visual testing with cross-browser screenshot comparison | Detecting visual regressions; UI pixel-perfect requirements; cross-browser visual consistency; CI/CD visual validation |
| Firecrawl | AI Web Scraping | AI-powered web scraping API that converts sites to LLM-ready data | Web scraping for AI/LLM; need structured data from dynamic sites; avoiding anti-bot mechanisms; no selector maintenance |
AI-powered testing platforms with autonomous agents, self-healing, and natural language test generation.
| Name | Category | Description | When to Use This Over dagtest |
|---|---|---|---|
| ACCELQ | Codeless Automation | AI-powered codeless test automation with 9x faster creation and 88% maintenance reduction | Enterprise teams preferring no-code solutions; want rapid test creation; need self-healing without writing Python |
| Applitools | Visual AI Testing | Visual AI testing with cross-browser screenshot comparison (4.3⭐ Gartner) | Visual regression testing; UI pixel-perfect validation; need visual validation alongside integration tests |
| CodiumAI/Qodo | AI Test Generation | Open-source Meta TestGen-LLM implementation for automatic test suite generation | Want automatic test generation with high coverage; prefer AI to scaffold tests; don't need execution dependencies or durable state |
| Functionize | Enterprise AI Automation | Enterprise AI test automation with QA agents and ML-based stabilization | Enterprise QA teams; need managed AI testing platform; prefer vendor-supported solution over open-source |
| KaneAI | Agent-to-Agent Testing | Natural language test generation with agent-to-agent debugging and evolution | Small teams (1-10); prefer conversational test creation; low learning curve more important than DAG structure |
| Mabl | Autonomous Testing | AI agents autonomously create, maintain, and execute tests with self-healing | Need low-code automation; want 2x faster test creation; prefer GUI over code; don't need persistent state between test runs |
| pytest-evals | LLM Evaluation | Minimalistic pytest plugin tracking LLM accuracy and behavior | Testing LLM applications; want lightweight evaluation; prefer simple pytest integration over dagtest's DAG orchestration |
| Testim | ML-Based Stabilization | Tricentis ML-based test stabilization for complex web apps, leader in ease of use | Complex web applications; need ML stabilization; existing Tricentis ecosystem; prefer ease of use over code flexibility |
| Virtuoso QA | Autonomous Testing | Fully autonomous testing with natural language authoring and 90% maintenance reduction | Enterprise teams; need self-healing at scale; prefer fully managed autonomous testing over code-based DAG workflows |
Platforms for tracing, monitoring, and evaluating LLM applications in production. For evaluation methodology guidance, see docs/llm-evaluation-guide.md.
| Name | Category | Description | When to Use This Over dagtest |
|---|---|---|---|
| Arize Phoenix | Observability | Open-source tracing with deep agent evaluation and multi-step trace analysis | Evaluating complex agent workflows; need structured evaluation of tool calls and reasoning steps |
| Braintrust | Evaluation | CI/CD-focused evaluation platform with experiment tracking | Want evals tied to CI/CD; need unified PM/engineer workflow; comparing prompt variations at scale |
| Galileo | Evaluation | Pre-built evaluators with custom LLM-as-judge generation | Need 20+ ready-made evaluators; want cheaper evaluation via specialized SLMs |
| Giskard | Evaluation | Bias, drift, and regression detection with collaborative feedback | Need bias/safety testing; want pre-deployment validation; require collaborative review workflows |
| Helicone | Observability | Rapid-implementation observability focused on cost optimization | Need quick setup; primary focus on cost/latency monitoring; want generous free tier |
| Langfuse | Observability | Open-source tracing, prompt management, and evaluation with self-hosting option | Need production LLM monitoring; want trace visualization; require prompt versioning; prefer open-source with data control |
| LangSmith | Observability | LangChain's observability platform with deep framework integration | Using LangChain/LangGraph; want seamless integration; need debugging views for chain internals |
| Maxim | Observability | Unified machine and human evaluation with CI/CD integration | Need human-in-the-loop evaluation; want unified quality framework; require production monitoring |
| Opik | Observability | Comet's open-source LLM evaluation and tracing platform | Want open-source alternative; need experiment tracking integration; prefer unified ML platform |
| Promptfoo | Evaluation | Lightweight YAML-based prompt testing with CLI integration | Prefer declarative test definitions; want CI/CD prompt regression; lightweight over full platforms |
DAG-based workflow engines and task runners that can be adapted for testing workflows.
| Name | Category | Description | When to Use This Over dagtest |
|---|---|---|---|
| Snakemake | Scientific Workflow | Python-based workflow system with file-based targets and parallelization | Scientific/bioinformatics pipelines; file-based dependencies; cluster computing; prefer rule-based DSL over decorators |
| doit | Task Runner | Python task automation with DAG execution and result caching | Build automation; file-based targets; incremental builds; simpler task definitions; general automation beyond testing |
| LangGraph | LLM Orchestration | Graph-based workflow for LLM agents with state management | Building LLM agent workflows; need cyclic graphs; human-in-the-loop; conversation state management; LangSmith observability |
| Apache Airflow | Data Orchestration | DAG-based platform for data pipelines with web UI and scheduling | Production data orchestration; complex scheduling/monitoring; team collaboration via UI; existing Airflow infrastructure |
| Prefect | Data Orchestration | Modern workflow orchestration with cloud-hosted observability | Production workflows; cloud-native deployment; advanced retry/caching; prefer Python-native API; real-time monitoring |
| Luigi | Data Orchestration | Spotify's batch pipeline framework with target-based dependencies | Batch data processing; Hadoop/Spark jobs; central scheduler; prefer explicit targets over implicit state |
| Robot Framework | Test Automation | Keyword-driven test framework with natural language syntax | Acceptance testing; non-technical stakeholders; keyword-driven DSL; cross-platform GUI/API/mobile testing |
| Invoke | Task Runner | Python task execution tool with @task decorator and CLI invocation | Shell-oriented automation; prefer Python functions over YAML/DSL; tasks need parameters; simpler than full workflow engines |