Skip to content

Model and run complex workflow automation testing with Playwright and event listeners

Notifications You must be signed in to change notification settings

KyleKing/dagtest

Repository files navigation

dagtest

DAG-based integration test framework with durable state.

Installation

pip install dagtest

Quick Start

from dagtest import test

@test
async def test_root(ctx):
    return {"value": 1}

@test(depends_on=["test_root"])
async def test_child(ctx):
    parent_value = ctx.results["test_root"]["value"]
    assert parent_value == 1
    return {"computed": parent_value + 1}
dagtest run
dagtest list
dagtest graph

Visualization

Branch-Based Status Display

During execution, dagtest shows real-time branch status with hierarchical notation:

   Running root_step...
   PASS root_step (3ms)
[A•*]|A Running branch_a...
[A✓*]|A PASS branch_a (2ms)
[A•*]|B Running branch_b...
[A✓*]|B PASS branch_b (1ms)
[A•*]|C Running branch_c...
[A✓*]|C PASS branch_c (1ms)
[AC•]|A Running final_step...
[AC✓]|A PASS final_step (1ms)

Status Format: [status_summary]|[last_component]

First Bracket - Shows state at each depth level:

  • Letter (A, B, C...) = completed at this depth
  • = currently running
  • = successfully completed (shown on completion)
  • = failed but will retry (soft failure)
  • = retrying after failure
  • * = pending/not yet started
  • = hard failed (after all retries) or blocked by upstream failure

Second Bracket - Last component of branch path, colored by depth for visual differentiation

Example breakdown:

  • [A•*]|B → Root complete (A), this branch running (•), one pending level (*), branch identifier B
  • [AC•]|A → Root complete (A), parent branch complete (C), this node running (•), branch identifier A

Failure Propagation and Retries

Failed nodes propagate to downstream branches, and retries are shown with the retry icon:

[A•**]|A Running branch_a...
[A✓**]|A PASS branch_a (1ms)
[A•**]|B Running branch_b...
[A✗✗✗]|B FAIL branch_b (2ms)
       ValueError: Simulated failure

The [A✗✗✗]|B shows:

  • Root complete (A)
  • This branch hard failed (✗)
  • Two downstream levels blocked (✗✗)
  • Branch identifier B

Retry example:

[A•*]|A Running flaky_step...
[A⚠*]|A FAIL flaky_step (3ms)
       ValueError: First attempt fails
[A•*]|A Running flaky_step... (attempt 2)
[A✓*]|A PASS flaky_step (2ms)
  • [A⚠*]|A shows soft failure (will retry) - downstream nodes remain pending (*)
  • After successful retry, shows ✓

Hard failure (retries exhausted):

[A•*]|A Running always_fails...
[A⚠*]|A FAIL always_fails (3ms)
       ValueError: Always fails
         With multiple lines
         Of error output
[A•*]|A Running always_fails... (attempt 2)
[A✗✗]|A FAIL always_fails (2ms)
       ValueError: Always fails
  • First failure: [A⚠*]|A (soft, will retry)
  • Final failure: [A✗✗]|A (hard, retries exhausted)
  • Downstream nodes blocked: ✗✗ indicates two pending levels now blocked

DAG Export Formats

Export DAG diagrams for documentation and review:

# Terminal tree view (default)
dagtest graph

# Mermaid diagram for documentation
dagtest graph --format mermaid --output dag.mmd

# Preview before execution
dagtest run --dry-run

Mermaid output can be rendered in GitHub, GitLab, or documentation tools that support Mermaid.js syntax.

Works well with

Name Use Case
DeepEval LLM evaluation framework integrated with pytest; evaluate LLM outputs within dagtest workflows
GitHub Copilot AI-powered test generation from natural language; accelerates writing dagtest test functions
gogcli Command-line interface for Google Workspace services (Gmail, Calendar, Drive, etc.); automate email verification, calendar events, and document workflows in integration tests
Langfuse Open-source LLM observability; trace and evaluate LLM calls within dagtest workflows, capturing multi-step execution history
Playwright Browser Automation
Promptfoo Lightweight prompt regression testing; validate individual prompts while dagtest orchestrates full workflow integration
Sweet Cookie If only ever building for local usage, never implement login steps by extracting browser cookies

Alternatives

Integration Testing Tools

Tools directly comparable to dagtest for various integration testing scenarios.

Name Category Description When to Use This Over dagtest
Playwright Browser/E2E Cross-browser automation with auto-wait and multi-language support Browser UI testing; need screenshots/videos; cross-browser compatibility; parallel execution; modern web apps with SPAs
Cypress Browser/E2E Developer-centric browser testing with time-travel debugging JavaScript-heavy apps; real-time reloading during test development; visual debugging; prefer all-in-one developer experience
Selenium Browser/E2E Industry-standard cross-browser automation with broad language support Legacy browser support; existing Selenium infrastructure; mobile testing via Appium; enterprise compliance requirements
Testcontainers Integration Testing Docker containers for integration tests with real services Testing requires real databases/services; need isolated environments per test run; no persistent state between tests
pytest-dependency Test Dependency Pytest plugin for test execution order and dependencies Simple test ordering without durable state; existing pytest workflow; tests don't need to pass data between each other
pytest-workflow Pipeline Testing YAML-based testing for workflow systems (Nextflow, Snakemake, WDL) Testing bioinformatics/scientific pipelines; prefer YAML over Python; validating CLI tool outputs and exit codes
Syrupy Snapshot Testing Zero-dependency pytest snapshot testing with idiomatic assertions Testing complex output structures; want to detect any changes; prefer declarative assertions over manual validation
Pact Contract Testing Consumer-driven contract testing for microservices Testing API contracts between services; microservice architecture; want to catch breaking changes before deployment
Postman API Testing GUI-based API testing with collaboration features Manual/exploratory API testing; sharing collections with team; non-developers writing tests; quick prototyping
REST Assured API Testing Java library for RESTful API testing with fluent syntax Java-based projects; prefer code-based API tests; CI/CD integration; comprehensive API validation with assertions
Percy Visual Regression Automated visual testing with cross-browser screenshot comparison Detecting visual regressions; UI pixel-perfect requirements; cross-browser visual consistency; CI/CD visual validation
Firecrawl AI Web Scraping AI-powered web scraping API that converts sites to LLM-ready data Web scraping for AI/LLM; need structured data from dynamic sites; avoiding anti-bot mechanisms; no selector maintenance

AI-Assisted Testing Tools

AI-powered testing platforms with autonomous agents, self-healing, and natural language test generation.

Name Category Description When to Use This Over dagtest
ACCELQ Codeless Automation AI-powered codeless test automation with 9x faster creation and 88% maintenance reduction Enterprise teams preferring no-code solutions; want rapid test creation; need self-healing without writing Python
Applitools Visual AI Testing Visual AI testing with cross-browser screenshot comparison (4.3⭐ Gartner) Visual regression testing; UI pixel-perfect validation; need visual validation alongside integration tests
CodiumAI/Qodo AI Test Generation Open-source Meta TestGen-LLM implementation for automatic test suite generation Want automatic test generation with high coverage; prefer AI to scaffold tests; don't need execution dependencies or durable state
Functionize Enterprise AI Automation Enterprise AI test automation with QA agents and ML-based stabilization Enterprise QA teams; need managed AI testing platform; prefer vendor-supported solution over open-source
KaneAI Agent-to-Agent Testing Natural language test generation with agent-to-agent debugging and evolution Small teams (1-10); prefer conversational test creation; low learning curve more important than DAG structure
Mabl Autonomous Testing AI agents autonomously create, maintain, and execute tests with self-healing Need low-code automation; want 2x faster test creation; prefer GUI over code; don't need persistent state between test runs
pytest-evals LLM Evaluation Minimalistic pytest plugin tracking LLM accuracy and behavior Testing LLM applications; want lightweight evaluation; prefer simple pytest integration over dagtest's DAG orchestration
Testim ML-Based Stabilization Tricentis ML-based test stabilization for complex web apps, leader in ease of use Complex web applications; need ML stabilization; existing Tricentis ecosystem; prefer ease of use over code flexibility
Virtuoso QA Autonomous Testing Fully autonomous testing with natural language authoring and 90% maintenance reduction Enterprise teams; need self-healing at scale; prefer fully managed autonomous testing over code-based DAG workflows

LLM Observability & Evaluation

Platforms for tracing, monitoring, and evaluating LLM applications in production. For evaluation methodology guidance, see docs/llm-evaluation-guide.md.

Name Category Description When to Use This Over dagtest
Arize Phoenix Observability Open-source tracing with deep agent evaluation and multi-step trace analysis Evaluating complex agent workflows; need structured evaluation of tool calls and reasoning steps
Braintrust Evaluation CI/CD-focused evaluation platform with experiment tracking Want evals tied to CI/CD; need unified PM/engineer workflow; comparing prompt variations at scale
Galileo Evaluation Pre-built evaluators with custom LLM-as-judge generation Need 20+ ready-made evaluators; want cheaper evaluation via specialized SLMs
Giskard Evaluation Bias, drift, and regression detection with collaborative feedback Need bias/safety testing; want pre-deployment validation; require collaborative review workflows
Helicone Observability Rapid-implementation observability focused on cost optimization Need quick setup; primary focus on cost/latency monitoring; want generous free tier
Langfuse Observability Open-source tracing, prompt management, and evaluation with self-hosting option Need production LLM monitoring; want trace visualization; require prompt versioning; prefer open-source with data control
LangSmith Observability LangChain's observability platform with deep framework integration Using LangChain/LangGraph; want seamless integration; need debugging views for chain internals
Maxim Observability Unified machine and human evaluation with CI/CD integration Need human-in-the-loop evaluation; want unified quality framework; require production monitoring
Opik Observability Comet's open-source LLM evaluation and tracing platform Want open-source alternative; need experiment tracking integration; prefer unified ML platform
Promptfoo Evaluation Lightweight YAML-based prompt testing with CLI integration Prefer declarative test definitions; want CI/CD prompt regression; lightweight over full platforms

Workflow & Task Systems

DAG-based workflow engines and task runners that can be adapted for testing workflows.

Name Category Description When to Use This Over dagtest
Snakemake Scientific Workflow Python-based workflow system with file-based targets and parallelization Scientific/bioinformatics pipelines; file-based dependencies; cluster computing; prefer rule-based DSL over decorators
doit Task Runner Python task automation with DAG execution and result caching Build automation; file-based targets; incremental builds; simpler task definitions; general automation beyond testing
LangGraph LLM Orchestration Graph-based workflow for LLM agents with state management Building LLM agent workflows; need cyclic graphs; human-in-the-loop; conversation state management; LangSmith observability
Apache Airflow Data Orchestration DAG-based platform for data pipelines with web UI and scheduling Production data orchestration; complex scheduling/monitoring; team collaboration via UI; existing Airflow infrastructure
Prefect Data Orchestration Modern workflow orchestration with cloud-hosted observability Production workflows; cloud-native deployment; advanced retry/caching; prefer Python-native API; real-time monitoring
Luigi Data Orchestration Spotify's batch pipeline framework with target-based dependencies Batch data processing; Hadoop/Spark jobs; central scheduler; prefer explicit targets over implicit state
Robot Framework Test Automation Keyword-driven test framework with natural language syntax Acceptance testing; non-technical stakeholders; keyword-driven DSL; cross-platform GUI/API/mobile testing
Invoke Task Runner Python task execution tool with @task decorator and CLI invocation Shell-oriented automation; prefer Python functions over YAML/DSL; tasks need parameters; simpler than full workflow engines

About

Model and run complex workflow automation testing with Playwright and event listeners

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages