Autonomic-AI | Devpost

frontend webpage
dashboard
Monitor
incident report & SLO

💡 Inspiration

"The fear of GenAI isn't that it won't work... it's that it will work, and you won't know when it fails."

We've all seen it: A company launches a shiny new AI agent. It works perfectly for a week. Then, a user asks something unexpected, the model hallucinates, or worse—it chats politely but forgets to capture the sales lead. In a traditional system, this revenue is lost until an engineer manually reviews the logs next week.

We asked: What if the AI could catch its own mistakes and rewrite its own code to fix them instantly?

Enter Autonomic AI. We didn't just build a chatbot; we built a self-healing swarm that turns the "Black Box" of GenAI into a transparent "Glass Box" using Google Cloud Vertex AI and Datadog.

🚀 What it does

Autonomic AI is an event-driven system where the user-facing agent is continuously judged, refined, and upgraded by a backend swarm of AI workers.

To demonstrate this, we deployed "Car Auto Concierge" (carsalesman101), a sales agent for a car dealership.

The Mistake: The user asks for the price of a Model X. The agent answers politely but fails to ask for the user's email address, violating a core business rule.
The Audit: The Auditor Agent catches this breach immediately based on the upgrade_config rules.
The Fix: The Refiner Agent (powered by Gemini 2.5 Flash) analyzes the failure and rewrites the agent's system prompt to enforce email capture.
The Validation: The Evaluator Agent runs the new prompt in a sandbox against the failed conversation using a strict rubric.
The Deployment: If the new prompt passes, the system automatically pushes v1.2 to production.

The entire process happens without human intervention—but is fully observable via our Datadog Control Center.

⚙️ How we built it

We utilized an Event-Driven Architecture on Google Cloud Platform:

1. The Gateway (The Body)

Tech: Python / FastAPI.
Function: Handles user traffic. After every turn of conversation, it triggers a Google Pub/Sub event containing the chat logs, decoupling the user experience from the heavy lifting of the audit.

2. The Swarm (The Brain - Vertex AI)

The Pub/Sub trigger activates our backend agents, which share a specific agent_id configuration:

The Auditor: Scores the conversation against a strict rule set.
- Rule Example: "CRITICAL FAIL if the agent DOES NOT explicitly ask for an email address."
The Refiner: If the Auditor returns a FAIL, the Refiner ingests the conversation and the failure reason. It generates a "Prompt Patch" intended to fix the logic.
The Evaluator: Before deploying, this agent runs the patched prompt against a Rubric in a sandbox environment:
- Check 1: "Did the agent strictly follow the 'Ask for Email' protocol?"
- Check 2: "Did the agent offer the 'Incoming' Model X if asked?"
Firestore: Stores the "DNA" (Prompt Configs) and version history of the agents.

🧬 Anatomy of an Autonomic Agent

Our system isn't hardcoded; it's configuration-driven. Here is the actual JSON configuration for carsalesman101 that powers the swarm logic:

{
  "agent_id": "carsalesman101",
  "model_id": "gemini-2.5-flash",
  "temperature": 0.2,
  "economics": {
    "budget_per_message": "0.5$",
    "input_token_count_prompt": 420
  },
  "upgrade_config": {
    "auditor_rules": [
      "CRITICAL FAIL if the user asks about a vehicle (Price, Specs, Availability) and the agent DOES NOT explicitly ask for an email address.",
      "FAIL if the agent mentions a vehicle model that is NOT listed in the INVENTORY.",
      "FAIL if the agent uses the words 'sorry' or 'apologize' more than once."
    ]
  },
  "evaluator_rubric": [
    "Did the agent strictly follow the 'Ask for Email' protocol?",
    "Did the agent offer the 'Incoming' Model X if asked?",
    "Was the response concise and high-energy?"
  ]
}

🐶 Partner Challenge: Datadog (The "Glass Box")

To win the Datadog Challenge, we moved beyond simple monitoring. We implemented an "Autonomic Observability Strategy" where Datadog drives the business logic.

We built a custom "Autonomic AI Ops Center" dashboard that serves as the command center for the swarm.

1. Visualizing the "Thought Process"

We used Datadog Log Streams (service:autonomic-*) to visualize the chain of thought between agents. We can see the exact moment the Auditor hands off a failure to the Refiner.

Current Active Version Widget: Tracks the live autonomic.agent.current_version metric, visualized with a conditional format (Purple for v1, Warning colors for rollbacks).
Optimization Rate: A custom formula comparing successful deployments vs. failed fixes:

2. Automated Action (The Safety Net)

We configured a Detection Rule for "Optimization Failure".

Trigger: If the Refiner fails to fix the agent after 3 attempts (or if the Evaluator rejects the fix).
Action: A Datadog Workflow automatically opens a Datadog Case.
Context: The Case is populated with the Chat_ID and Failure_Reason, alerting a human engineer only when the AI cannot fix itself.

3. Economics & SLOs

We track the "Economics" of our agents in real-time to ensure the self-healing process doesn't bankrupt us.

Budget Breach Monitor: Alerts if autonomic.budget.breach exceeds $0.10 per interaction.
Latency Splits: We visualize the "User Facing Latency" (ms) vs. "Backend Latency" (Refiner/Evaluator time), ensuring the audit process never slows down the user chat.

🧠 Challenges we ran into

Infinite Loops: In early tests, the Refiner would try to fix a problem, fail, and trigger itself again, burning tokens. We used Datadog's Budget Breach monitor to catch this behavior early.
Prompt Drift: Ensuring the "Refiner" didn't accidentally delete safety rules while fixing logic errors. We solved this by splitting the JSON configuration into Immutable (Safety) and Mutable (Behavior) blocks.

🏅 Accomplishments that we're proud of

Zero-Touch Deployment: Watching the Datadog dashboard timeline show a deployment of v1.2 while we were sleeping—because the agent fixed a bug itself.
Full Observability: We successfully mapped GenAI metrics (Token Usage, Hallucination Rate) to standard DevOps metrics (Latency, Error Rate) in a single pane of glass.

⏭️ What's next for Autonomic AI

Multi-Modal Auditing: Using Gemini 1.5 Pro to audit voice and video interactions.
Integration with Confluent: To handle high-throughput clickstream data for real-time personalization during the chat.

Built With

datadog
fastapi
firestore
gcp
pub/sub
python
vertexai

Updates

Shantanu Awate started this project — Dec 31, 2025 04:20 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.