AgentOps Commander

AgentOps Commander dashboard showing a human-supervised incident agent workflow.

Inspiration

Most AI agents can answer questions, but real teams need agents that can safely take action. During an incident, operators need more than another chat window: they need a system that can investigate evidence, explain its plan, request approval, execute low-risk actions, and leave an audit trail.

AgentOps Commander is designed for that gap. It turns Gemini into an observable, human-supervised operations agent that can diagnose a real-world workflow issue and then improve from its own traces.

What it does

AgentOps Commander receives an operational goal, such as investigating a spike in failed orders, delayed support responses, or degraded service health. The agent breaks the goal into a plan, calls tools to inspect data and logs, summarizes the evidence, proposes next actions, and waits for human approval before executing any action that changes state.

The key feature is the self-improvement loop. Every agent step is traced with Arize Phoenix. The agent can use Phoenix MCP to review its own traces, identify weak reasoning or missing tool calls, and recommend changes to prompts, tool descriptions, or evaluation criteria.

How we built it

The demo is a code-owned Node.js agent runtime with a web dashboard. The current build includes deterministic demo mode for reliable judging and recording, plus a concrete agent-builder-config.json contract for Gemini, Google Cloud Agent Builder, Cloud Run, Arize Phoenix, OpenInference, and Phoenix MCP.

The UI shows the operational incident, the agent plan, tool calls, evidence, approval gate, execution result, trace timeline, evaluation scores, and self-review recommendations.

The repository includes an automated regression suite based on OWASP, WCAG, MDN, and web.dev guidance. It covers security headers, path traversal payloads, HTTP verb tampering, malformed requests, XSS-shaped payload escaping, offline fallback, accessibility metadata, mobile layout risks, hosted-demo readiness, submission assets, and the agent approval flow.

Challenges we ran into

The biggest product challenge was making the demo feel like a real operational workflow instead of a chatbot. We focused on evidence, auditability, and safety gates so the agent's action path is inspectable.

The biggest technical challenge was designing a submission demo that can run reliably during judging even when API credentials are not available. The app supports a deterministic demo mode while preserving the architecture needed for live Gemini and Phoenix integration.

Accomplishments that we're proud of

We built a complete plan-act-trace-improve loop. The agent does not just produce an answer; it identifies root cause, proposes a mitigation, requires approval, records execution, and exposes traces and evaluations.

What we learned

Production agents need observability as a first-class feature. Tracing, evaluation, and human supervision are not extras; they are what make agentic systems trustworthy enough for real work.

What's next

Next steps are to connect live Gemini reasoning through the Agent Builder contract, deploy the runtime to Cloud Run, stream OpenInference traces into Arize Phoenix, wire Phoenix MCP for real trace queries, and add more operational playbooks.

Built With

arize-phoenix
cloud-run
css
firestore
gemini
github
google-cloud
google-cloud-agent-builder
html
javascript
node.js
openinference
opentelemetry
phoenix-mcp
secret-manager

Updates

RayWu1037 Wu started this project — Jun 03, 2026 03:56 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.