Patch — Autonomous Pipeline Recovery for GitLab Duo

Inspiration

Every developer has woken up to a broken CI/CD pipeline. A flaky test, a missing dependency, a timeout — small failures that block the entire team until someone manually investigates. We asked ourselves: what if GitLab could diagnose and fix these on its own, the same way a senior DevOps engineer would?

What it does

Patch is a GitLab Duo Agent that autonomously triages pipeline failures. When a pipeline breaks, Patch:

Gathers context — reads job logs, commit diffs, recent pipeline history, and affected source files
Classifies the failure into one of 8 categories (dependency error, flaky test, syntax error, infrastructure issue, etc.)
Takes action based on the diagnosis:
- Retry transient failures (flaky tests, timeouts, network blips)
- Open a fix MR for dependency and syntax errors with the minimal targeted change
- Revert logic regressions (only on non-protected branches with high confidence)
- Escalate unknown or infrastructure issues by creating a detailed GitLab issue with investigation checklists

Built-in safety gates ensure Patch only acts when confident — confidence thresholds, protected branch guards, and flakiness detection prevent bad auto-fixes or wrong reverts.

How we built it

We built Patch using GitLab's Duo Agent and Flow configuration framework. The project includes two configurations:

Single Agent (agent.yml) — a self-contained agent with one comprehensive system prompt that handles the entire workflow in a conversational loop
Multi-Step Flow (flow.yml) — a structured 3-stage pipeline (gather → classify → execute) where each step has a focused prompt and scoped toolset for better auditability

Both configurations use GitLab's built-in AI tool catalog (28 tools total) for reading logs, diffs, files, and for creating commits, merge requests, and issues — no external infrastructure required.

Challenges we ran into

Prompt engineering for reliability — getting the agent to consistently follow the full triage workflow (gather → classify → act) without skipping steps or jumping to conclusions required careful prompt structuring and explicit behavioral rules.
Safety gate design — balancing automation speed with safety. We needed confidence thresholds strict enough to prevent bad reverts but loose enough that the agent doesn't escalate everything. Tuning these thresholds took iteration.
Scoping tool access — in the multi-step flow, deciding which tools each step should have access to was critical. Giving the classifier zero tools forces it to reason purely on the context gathered, preventing it from going on tangents.

Accomplishments that we're proud of

Zero infrastructure — Patch runs entirely on GitLab's AI platform. No servers, no API keys, no databases to manage.
The safety-first design — protected branch guards, confidence thresholds, and automatic escalation fallbacks mean Patch won't make things worse.
Structured escalation reports — even when Patch can't fix the problem, it creates detailed issues with root cause analysis, log evidence, and investigation checklists that give developers a real head start.
Two deployment modes — teams can choose the simple single-agent setup for quick wins or the multi-step flow for production-grade auditability.

What we learned

GitLab's Duo Agent framework is surprisingly powerful for building domain-specific AI workflows that integrate deeply with the DevOps lifecycle.
Separating reasoning from action (the classify step has no tools) leads to more reliable decision-making — when the model can't "just try things," it thinks harder.
The hardest part of autonomous agents isn't the happy path — it's designing graceful fallbacks for when the agent isn't sure.

What's next for 2am_dev

Webhook-triggered automation — have Patch run automatically on every pipeline failure, not just on-demand
Learning from outcomes — track whether retries succeeded and fixes merged to improve classification confidence over time
Multi-project support — monitor pipelines across an entire GitLab group and surface cross-project failure patterns
Custom failure categories — let teams define their own classification rules for domain-specific failure types

Built With

duoagent
gitlab
yaml

Updates

Jacky Vo started this project — Mar 25, 2026 12:19 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.