Pantheon: Incident Response for the "Always-On" Era

Inspiration

It’s Saturday night. You’re at a concert, at the gym, or just grabbing groceries when the page hits: Production is down. Your heart sinks because you’re 45 minutes away from your laptop, and you know that for a company like NextEra, every minute of downtime costs an average of $14,600. By the time you get home, VPN in, and pull logs, the company has already bled over $650,000.

We built Pantheon because the "laptop bottleneck" is a relic of the past. In a world where AI-generated "ghost code" is making systems more complex and outages more frequent, we realized the response needs to be as mobile as the engineer. We wanted to move the War Room into a Telegram chat so you can trigger a full-scale malware investigation and containment plan while you’re still standing in the checkout line.

What it does

Pantheon is an AI-driven malware analysis and incident response swarm. Instead of a single "smart" chatbot, it’s a coordinated team of specialized agents each named after a Greek god that handle the dirty work for you.

  • Hermes (The Gateway): Our voice-native Telegram bot. You don't need a terminal; you just send a voice note or upload a sample. Hermes routes the context to the swarm immediately.
  • The Swarm: Powered by Google Cloud ADK, five agents (Zeus, Athena, Hades, Apollo, and Ares) work in parallel. They triage the severity, detonate the sample in our custom sandbox, extract indicators of compromise (IOCs), and build a remediation plan.
  • Live Visibility (OlympusFlow): While you’re talking to the AI, a real-time dashboard OlympusFlow streams every tool call and process tree update. The dashboard uses "GodNodes" and "ParticleEdges" to visualize handoffs and "thoughts" as pulsing light in real-time.

How we built it

We didn't just wrap a prompt; we built an autonomous engineering pipeline.

  • Advanced Orchestration: We maximized the Google Cloud ADK by using native ParallelAgent primitives for simultaneous planning and LoopAgent verifiers for self-correcting response plans.
  • Distributed Intelligence: We implemented the A2A (Agent-to-Agent) protocol to perform a cross-service handshake with a remote impact specialist running on a completely separate Cloud Run deployment.
  • The "Detonation Chamber": Our Hephaestus sandbox is a custom FastAPI service that spawns hardened Docker containers with zero network access and no-new-privileges flags.
  • Telemetry Backbone: Every agent event is mirrored to a Kafka topic for durable replay and downstream SIEM integration making the project enterprise-ready.

Challenges we ran into

The biggest technical hurdle was A2A Latency. In a crisis, every second counts. Coordinating a distributed workforce across different Cloud Run services required a massive overhaul of our event-driven architecture to ensure the A2A handshake didn't feel "laggy".

We also faced the reality of Malware Isolation. Running the NextEra challenge sample—a real, malicious WSH dropper meant we had to be incredibly disciplined with our Docker security flags (--cap-drop ALL, --read-only) to make sure the "fire" stayed in the box.

Accomplishments that we're proud of

We’re hyped that Pantheon didn't just "flag" the malware; it stripped it naked. It successfully identified a multi-stage WSH dropper that used Cyrillic encoding and hidden .png payloads (Mands.png, Vile.png) to evade traditional EDR.

Most importantly, we proved that a "Phone-First" response works. In our testing, we reduced the MTTR (Mean Time To Recovery) from an average of 4 hours to just 120 seconds, theoretically saving a simulated enterprise over $1.7 million in potential downtime losses.

What we learned

We learned that Self-Correction is the secret sauce. By using an ADK LoopAgent, we allowed our verifier to check the Ares response plans against the evidence produced by the sandbox. If a plan references an unsupported behavior, the system automatically triggers a revision before the human ever sees it. We also learned that the best UX for an engineer at 2 AM is a voice that says, "I've handled the containment; here's the plan.".

What's next for Pantheon

We’re looking to make our Artemis background daemon even more proactive moving from "Analyze this file" to "I noticed a weird spike in registry writes on Server B, I've already isolated it and am triaging now.". We also want to expand our Ares agent to automatically push YARA rules and firewall blocks directly to production environments, truly completing the "Self-Healing World" loop.

The Win-Win-Win

  • Companies save millions by stopping the downtime clock before the engineer even gets home.
  • Engineers save their sanity by triaging the "fire" from their phone.
  • Users stay happy because the apps they rely on stay online.

Built With

Share this project:

Updates