Captain Perf | Devpost

Inspiration

Developers building GenAI applications often struggle to understand how latency and throughput impact user experience. With the rise of multi-agent systems, performance bottlenecks can come from coordination overhead, model inference delays, or network latency. We wanted to build a tool that helps developers evaluate, debug, and improve their multi-agent systems—starting with performance.

What it does

Captain Perf is a real-time benchmarking and simulation tool for AI agents built using the Agent Development Kit (ADK). It lets developers enter the endpoint of their multi-agent system, select test presets (e.g., chat, code generation, data retrieval), and measure real-world performance under load.

It reports detailed metrics like token-level latency, TTFT, throughput, and agent handoff timing. It also includes a unique Latency Simulation Chat that mimics how delays affect user experience across multi-agent conversations.

How we built it

We built Captain Perf using:

Python ADK for agent orchestration
Google Cloud Run to host inference services
React frontend for UI/UX
Visualization via Chart.js

Our benchmarking harness sends structured prompts across multiple types of agent workflows and records timing data at each step in the chain.

Challenges we ran into

Coordinating agent-level benchmarks and aggregating timing metrics
Designing a chat interface that accurately mimics delay patterns without confusing users

Accomplishments that we're proud of

Successfully measured and visualized performance across multi-agent workflows
Built a generic system that works with any ADK-based deployment
Enabled developers to not only see performance metrics but feel them via simulation

What we learned

Latency perception is nonlinear—users tolerate small delays, but stacking agent latencies breaks experience fast
ADK makes building modular agent systems easy, but surfacing performance telemetry requires planning

What's next for Captain Perf

Integrate with the Agent Engine to visualize internal task graphs and bottlenecks
Add synthetic user personas to simulate real-world behavior
Build presets for SDLC, customer support, and content workflows