💡 Inspiration

"The fear of GenAI isn't that it won't work... it's that it will work, and you won't know when it fails."

We've all seen it: A company launches a shiny new AI agent. It works perfectly for a week. Then, a user asks something unexpected, the model hallucinates, or worse—it chats politely but forgets to capture the sales lead. In a traditional system, this revenue is lost until an engineer manually reviews the logs next week.

We asked: What if the AI could catch its own mistakes and rewrite its own code to fix them instantly?

Enter Autonomic AI. We didn't just build a chatbot; we built a self-healing swarm that turns the "Black Box" of GenAI into a transparent "Glass Box" using Google Cloud Vertex AI and Datadog.

🚀 What it does

Autonomic AI is an event-driven system where the user-facing agent is continuously judged, refined, and upgraded by a backend swarm of AI workers.

To demonstrate this, we deployed "Car Auto Concierge" (carsalesman101), a sales agent for a car dealership.

  1. The Mistake: The user asks for the price of a Model X. The agent answers politely but fails to ask for the user's email address, violating a core business rule.
  2. The Audit: The Auditor Agent catches this breach immediately based on the upgrade_config rules.
  3. The Fix: The Refiner Agent (powered by Gemini 2.5 Flash) analyzes the failure and rewrites the agent's system prompt to enforce email capture.
  4. The Validation: The Evaluator Agent runs the new prompt in a sandbox against the failed conversation using a strict rubric.
  5. The Deployment: If the new prompt passes, the system automatically pushes v1.2 to production.

The entire process happens without human intervention—but is fully observable via our Datadog Control Center.

⚙️ How we built it

We utilized an Event-Driven Architecture on Google Cloud Platform:

1. The Gateway (The Body)

  • Tech: Python / FastAPI.
  • Function: Handles user traffic. After every turn of conversation, it triggers a Google Pub/Sub event containing the chat logs, decoupling the user experience from the heavy lifting of the audit.

2. The Swarm (The Brain - Vertex AI)

The Pub/Sub trigger activates our backend agents, which share a specific agent_id configuration:

  • The Auditor: Scores the conversation against a strict rule set.
    • Rule Example: "CRITICAL FAIL if the agent DOES NOT explicitly ask for an email address."
  • The Refiner: If the Auditor returns a FAIL, the Refiner ingests the conversation and the failure reason. It generates a "Prompt Patch" intended to fix the logic.
  • The Evaluator: Before deploying, this agent runs the patched prompt against a Rubric in a sandbox environment:
    • Check 1: "Did the agent strictly follow the 'Ask for Email' protocol?"
    • Check 2: "Did the agent offer the 'Incoming' Model X if asked?"
  • Firestore: Stores the "DNA" (Prompt Configs) and version history of the agents.

🧬 Anatomy of an Autonomic Agent

Our system isn't hardcoded; it's configuration-driven. Here is the actual JSON configuration for carsalesman101 that powers the swarm logic:

{
  "agent_id": "carsalesman101",
  "model_id": "gemini-2.5-flash",
  "temperature": 0.2,
  "economics": {
    "budget_per_message": "0.5$",
    "input_token_count_prompt": 420
  },
  "upgrade_config": {
    "auditor_rules": [
      "CRITICAL FAIL if the user asks about a vehicle (Price, Specs, Availability) and the agent DOES NOT explicitly ask for an email address.",
      "FAIL if the agent mentions a vehicle model that is NOT listed in the INVENTORY.",
      "FAIL if the agent uses the words 'sorry' or 'apologize' more than once."
    ]
  },
  "evaluator_rubric": [
    "Did the agent strictly follow the 'Ask for Email' protocol?",
    "Did the agent offer the 'Incoming' Model X if asked?",
    "Was the response concise and high-energy?"
  ]
}

🐶 Partner Challenge: Datadog (The "Glass Box")

To win the Datadog Challenge, we moved beyond simple monitoring. We implemented an "Autonomic Observability Strategy" where Datadog drives the business logic.

We built a custom "Autonomic AI Ops Center" dashboard that serves as the command center for the swarm.

1. Visualizing the "Thought Process"

We used Datadog Log Streams (service:autonomic-*) to visualize the chain of thought between agents. We can see the exact moment the Auditor hands off a failure to the Refiner.

  • Current Active Version Widget: Tracks the live autonomic.agent.current_version metric, visualized with a conditional format (Purple for v1, Warning colors for rollbacks).
  • Optimization Rate: A custom formula comparing successful deployments vs. failed fixes:

2. Automated Action (The Safety Net)

We configured a Detection Rule for "Optimization Failure".

  • Trigger: If the Refiner fails to fix the agent after 3 attempts (or if the Evaluator rejects the fix).
  • Action: A Datadog Workflow automatically opens a Datadog Case.
  • Context: The Case is populated with the Chat_ID and Failure_Reason, alerting a human engineer only when the AI cannot fix itself.

3. Economics & SLOs

We track the "Economics" of our agents in real-time to ensure the self-healing process doesn't bankrupt us.

  • Budget Breach Monitor: Alerts if autonomic.budget.breach exceeds $0.10 per interaction.
  • Latency Splits: We visualize the "User Facing Latency" (ms) vs. "Backend Latency" (Refiner/Evaluator time), ensuring the audit process never slows down the user chat.

🧠 Challenges we ran into

  • Infinite Loops: In early tests, the Refiner would try to fix a problem, fail, and trigger itself again, burning tokens. We used Datadog's Budget Breach monitor to catch this behavior early.
  • Prompt Drift: Ensuring the "Refiner" didn't accidentally delete safety rules while fixing logic errors. We solved this by splitting the JSON configuration into Immutable (Safety) and Mutable (Behavior) blocks.

🏅 Accomplishments that we're proud of

  • Zero-Touch Deployment: Watching the Datadog dashboard timeline show a deployment of v1.2 while we were sleeping—because the agent fixed a bug itself.
  • Full Observability: We successfully mapped GenAI metrics (Token Usage, Hallucination Rate) to standard DevOps metrics (Latency, Error Rate) in a single pane of glass.

⏭️ What's next for Autonomic AI

  • Multi-Modal Auditing: Using Gemini 1.5 Pro to audit voice and video interactions.
  • Integration with Confluent: To handle high-throughput clickstream data for real-time personalization during the chat.

Built With

Share this project:

Updates