FlowPilot: Automating Incident Response with AI
Inspiration
I created FlowPilot after learning about the challenges that on call engineers face in today's always on digital economy. Research shows that engineers in on call rotations often suffer from interrupted sleep, increased stress, and reduced productivity. Meanwhile, the financial impact of downtime is staggering, companies lose an average of $18,000 per minute, with major platforms exceeding $300,000 per hour.
This economic reality, combined with the human cost of traditional on call rotations, inspired me to explore how AI could transform incident response. I wanted to create a solution that could handle routine alerts automatically, benefiting both companies through reduced downtime and engineers through fewer disruptions.
What it does
FlowPilot is an AI powered incident response agent that "stands in" for on call engineers by:
- Automatically detecting incidents via CloudWatch alarms
- Analyzing severity and context using AWS Bedrock (Claude 3 Sonnet)
- Making intelligent decisions to either reboot services or ignore false positives
- Executing remediation actions securely through kubectl commands
- Communicating status through multiple channels:
- Multi-language incident reports (via DeepL)
- Visual system diagrams of affected components
- Voice alerts for critical notifications (via Rime)
All workflows are built on Temporal for durability and wrapped in guMCP protocol for standardized agent communication.
How I built it
I architected FlowPilot as a modular system that integrates best in class technologies:
- Core Engine: Built on Temporal's durable workflow framework to ensure reliable execution
- AI Decision Making: Implemented AWS Bedrock with Claude 3 Sonnet for contextual analysis
- Secure Command Execution: Used Block/Goose for safe kubernetes operations
- Global Communication: Integrated DeepL for automatic translation of incident reports
- Audio Notifications: Implemented Rime's TTS capabilities for voice alerts
- Dashboard: Created a real time web UI to monitor incidents and workflow status
- Agent Communication: Implemented the guMCP/A2A protocol for standardized interactions
The system is primarily written in TypeScript with Node.js, leveraging Express for the API server and React for the dashboard.
Challenges I ran into
Building FlowPilot as a solo developer presented several significant challenges:
Security Constraints: Creating a system that can execute powerful commands while maintaining strict security controls required careful design and multiple safeguards.
Integration Complexity: Learning and connecting multiple sponsor technologies with different APIs and authentication systems was time consuming but essential.
Debugging Asynchronous Workflows: Temporal workflows are powerful but introduce complexity when debugging across multiple asynchronous processes.
Voice Alert Coordination: Preventing overlapping voice alerts required implementing a sophisticated queuing system.
Demo Environment: Creating a self contained demo that illustrates the system's capabilities without actual production infrastructure required creative solutions.
Accomplishments that I'm proud of
Despite the challenges of building this as a solo developer, I'm proud of several achievements:
- Creating a fully functional end to end system that demonstrates autonomous incident response
- Successfully integrating multiple sponsor technologies into a cohesive platform
- Building a clean, intuitive dashboard that clearly communicates incident status
- Implementing durable workflows that can survive failures and continue execution
- Demonstrating practical AI application that solves a real business problem with significant ROI
What I learned
This project provided valuable lessons in:
- Building Autonomous Systems: Creating AI agents that can safely take action in production environments
- Temporal Workflow Patterns: Structuring complex, long running processes with proper error handling
- AI Prompt Engineering: Crafting effective prompts for AWS Bedrock to make reliable decisions
- Multi Agent Communication: Implementing standardized protocols for agent interaction
- Voice UX Design: Creating clear, concise voice alerts that convey critical information efficiently
What's next for Flow Pilot
I plan to expand FlowPilot in several directions:
- Additional Incident Types: Supporting more diverse alerts beyond memory leaks and API failures
- Learning Capabilities: Implementing feedback loops so the system improves from past incidents
- Enterprise Features: Adding role based access control and audit logging for enterprise adoption
- More Integrations: Connecting with popular monitoring and alerting tools like Datadog and PagerDuty
- Open Source Core: Releasing a core version as open source to foster community contributions
The ultimate goal is to make traditional on call rotations a thing of the past for routine incidents, allowing engineers to focus on innovation rather than firefighting, while saving companies millions in downtime costs.
Built With
- amazon-web-services
- arize
- block/goose
- claude
- deepl
- express.js
- mcp/a2a
- node.js
- rime
- temporal
- typescript


Log in or sign up for Devpost to join the conversation.