Inspiration

Security debt is invisible until it explodes. 82% of organizations carry unresolved vulnerabilities — not because engineers don't care, but because fixing them manually takes 2–4 hours per finding. We wanted to close that gap completely.

What it does

ecuritySentry is an autonomous vulnerability remediation agent built on GitLab Duo. When a SAST scan finishes, it automatically triages findings by CVSS score, challenges uncertain results with runnable proof tests, negotiates fix scope with the developer, generates patches per CWE pattern, validates them against the test suite, and opens a documented MR — all in under 20 minutes

How we built it

We built SecuritySentry as a multi-agent GitLab Duo Workflow with six specialized agents — Triage, Fix, Test, MR, Report, and Green — chained together in a single orchestrator flow. Each agent has a focused prompt, a scoped toolset, and passes structured output to the next. The demo app is a realistic Flask codebase with five intentional vulnerabilities (CWE-89, 79, 22, 798, 918) that the pipeline detects and remediates end-to-end.

Challenges we ran into

Everything. This was my first time building an AI agent — ever. Five days ago I had never written a Duo Workflow, never chained agents together, and had no idea how to make one agent's output become another agent's structured input. Every single layer was completely new territory. The learning curve hit all at once: understanding how GitLab Duo Flows work, writing prompts that produce consistent JSON outputs, debugging why an agent would silently ignore its instructions, and figuring out that a single indentation error in YAML can break an entire pipeline without a clear error message. Things that probably take experienced agent developers an hour took me a full day. I also underestimated how different "it works in my head" is from "it actually works." The False-Positive Challenger sounded simple to design but was genuinely hard to implement — getting an agent to generate runnable proof files, commit them to a branch, and post a structured MR comment in exactly the right format required more iteration than I expected.

Accomplishments that we're proud of

Honestly, just finishing this makes me proud, or even just trying makes me proud.

What we learned

I learned to read error messages carefully instead of panicking. Most of the time the answer was already there a missing indent, a wrong field name, a blueprint that never got registered. I learned that AI agents are not magic. Behind every "autonomous" behavior is a prompt someone had to write, test, rewrite, and test again. The intelligence is in the design, not the model. I learned that shipping something imperfect in five days teaches you more than planning something perfect for five months. Five days ago I didn't know any of this. Now I do.

What's next for SecuritySentry

I want SecuritySentry to analyze a developer's coding patterns over time and build a personal vulnerability profile — so instead of waiting for SAST to catch a SQL injection, it warns the developer while they're still typing the query. I want to add cross-MR memory so the Style Linter doesn't just learn from one session — it builds a long-term model of how each developer writes code, what they accept, what they reject, and what mistakes they repeatedly make. Security guidance that gets smarter the more you use it. I want to expand the False-Positive Challenger into a full adversarial testing layer — not just two test cases per finding, but an automated red team that continuously probes the codebase for exploit paths and reports back with confidence scores. I want SecuritySentry to speak the language of business, not just engineering. Instead of posting "CWE-89 detected in users.py", it should calculate the estimated blast radius, the regulatory exposure (GDPR, SOC2, PCI-DSS), and the cost of a breach — so security decisions can be made at the executive level, not just the code review level.

Built With

Share this project:

Updates