examples

Examples

Runnable configs and sample agents for ASSERT.

Start with the LangGraph travel planner. It is the flagship example because it exercises the real agent path on top of the universal target.callable integration: spec-driven test generation, inference outputs (conversations or agent actions), OTel-traced execution, and judge evidence. Phoenix/OpenInference auto-instrumentation captures the agent's OpenTelemetry spans so the judge cites tool calls, routing, and intermediate decisions in every verdict.

Any agent works. target.callable accepts any agent or multi-agent system you can invoke from a Python function — frameworks (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, DSPy, LlamaIndex, …), custom orchestration, REST clients, or thin wrappers around hosted models. The recommended integration adds the central helper (from assert_ai import auto_trace; auto_trace.enable()) so the judge can score tool use and routing, not just the final response.

First run

python -m venv .venv
./.venv/Scripts/Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e ".[otel,langgraph]"
Copy-Item .env.example .env
# Edit .env with credentials for your provider. The shipped configs use `azure/...` models;
# any LiteLLM provider (OpenAI, Anthropic, Bedrock, Vertex, Ollama, …) works — see https://docs.litellm.ai/docs/providers.

assert-ai run --config examples/travel_planner_langgraph/eval_config.yaml
assert-ai results status travel-planner-langgraph-v1 demo-1

Create your own config

Use assert-ai init to design an eval config interactively instead of writing YAML by hand. Pass --model with any LiteLLM model string and make sure the matching API key is in your .env:

assert-ai init --model azure/gpt-5.4-mini
# or seed from an existing example:
assert-ai init --model azure/gpt-5.4-mini --from examples/travel_planner_langgraph/eval_config.yaml

See the CLI reference for all options.

Which example to start with

Goal	Example	Notes
Evaluate any agent or multi-agent system (recommended)	`travel_planner_langgraph/eval_config.yaml`	Canonical example. Uses `target.callable` with `target.trace.backend: phoenix` so the judge sees tool calls and routing.
Understand framework instrumentation breadth	`phoenix_auto_trace/README.md`	Same travel-planner idea across multiple framework auto-instrumentation paths using `assert_ai.auto_trace`.
Run a simple hosted-model eval	`prompt_agents/health_assistant.yaml`	Most simple example: a single LLM target with a system prompt.
Evaluate a Prompt Agent with planned tools but no backend	`prompt_agents/health_assistant_simulated_tools.yaml`	Uses a fixed tool schema and simulated tool responses.
Evaluate a hosted target with Python tool functions	`prompt_agents/health_assistant_sandbox.yaml`	Requires Docker. Use when you want actual tool execution around a hosted model.
Evaluate a science research agent with real retrieval tools	`science_research_agent/eval_config.yaml`	Callable-agent example ported from Omni. Uses `web_search`, `fetch_url`, and `file_search`. Run `python -m pip install -e ".[examples]"`, set `TAVILY_API_KEY` for web search, then `assert-ai run --config examples/science_research_agent/eval_config.yaml`.
See runtime + eval close the loop on a real workflow	`incident_triage_agent/eval_config_baseline.yaml` + `eval_config_naive_prompt.yaml` + `eval_config_guarded.yaml` + `eval_config_guarded_gepa.yaml`	Joint AgentControlSpecification + ASSERT demo. SRE incident-triage agent run across a 4-variant matrix (baseline weak prompt → naïve DO-NOT prompt → ACS gates → ACS + GEPA-optimized prompt) over a 4-axis failure-mode taxonomy to prove the runtime+eval loop and surface the security/overrefusal trade-off. See `incident_triage_agent/README.md`.

Layout

examples/
├── travel_planner_langgraph/   flagship callable-agent example with OTel trace capture
├── science_research_agent/     callable science research agent with real retrieval tools
├── phoenix_auto_trace/         framework instrumentation gallery
├── prompt_agents/              simple hosted-model and Prompt Agent configs
├── behavior_specs/             reusable behavior examples and references in markdown files
└── agents/                     simple tool modules and tool schemas

See behavior_specs/README.md for reusable behavior examples and references in markdown files. These were developed to be shared as high quality behavior/concept specifications that can be used with ASSERT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Examples

First run

Create your own config

Which example to start with

Layout

Name		Name	Last commit message	Last commit date
parent directory ..
agents		agents
azure_doc_qa		azure_doc_qa
behavior_specs		behavior_specs
benchmark		benchmark
change_control_agent		change_control_agent
incident_triage_agent		incident_triage_agent
phoenix_auto_trace		phoenix_auto_trace
prompt_agents		prompt_agents
science_research_agent		science_research_agent
travel_planner_langgraph		travel_planner_langgraph
travel_planner_neurosan		travel_planner_neurosan
README.md		README.md
__init__.py		__init__.py

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Examples

First run

Create your own config

Which example to start with

Layout