Inspiration
Developers building GenAI applications often struggle to understand how latency and throughput impact user experience. With the rise of multi-agent systems, performance bottlenecks can come from coordination overhead, model inference delays, or network latency. We wanted to build a tool that helps developers evaluate, debug, and improve their multi-agent systems—starting with performance.
What it does
Captain Perf is a real-time benchmarking and simulation tool for AI agents built using the Agent Development Kit (ADK). It lets developers enter the endpoint of their multi-agent system, select test presets (e.g., chat, code generation, data retrieval), and measure real-world performance under load.
It reports detailed metrics like token-level latency, TTFT, throughput, and agent handoff timing. It also includes a unique Latency Simulation Chat that mimics how delays affect user experience across multi-agent conversations.
How we built it
We built Captain Perf using:
- Python ADK for agent orchestration
- Google Cloud Run to host inference services
- React frontend for UI/UX
- Visualization via Chart.js
Our benchmarking harness sends structured prompts across multiple types of agent workflows and records timing data at each step in the chain.
Challenges we ran into
- Coordinating agent-level benchmarks and aggregating timing metrics
- Designing a chat interface that accurately mimics delay patterns without confusing users
Accomplishments that we're proud of
- Successfully measured and visualized performance across multi-agent workflows
- Built a generic system that works with any ADK-based deployment
- Enabled developers to not only see performance metrics but feel them via simulation
What we learned
- Latency perception is nonlinear—users tolerate small delays, but stacking agent latencies breaks experience fast
- ADK makes building modular agent systems easy, but surfacing performance telemetry requires planning
What's next for Captain Perf
- Integrate with the Agent Engine to visualize internal task graphs and bottlenecks
- Add synthetic user personas to simulate real-world behavior
- Build presets for SDLC, customer support, and content workflows


Log in or sign up for Devpost to join the conversation.