Installation |
Docs |
Open-source AI agent for investigating production incidents and finding root causes. A CNCF Sandbox project by Robusta.Dev.
- Petabyte-scale data: Server-side filtering, JSON tree traversal, and tool output transformers keep large payloads out of context windows
- Deep integrations: Prometheus, Grafana, Datadog, Kubernetes, and many more—plus any REST API
- Bidirectional alert integrations: Fetch alerts from AlertManager, PagerDuty, OpsGenie, or Jira—and write findings back
- Any LLM provider: OpenAI, Anthropic, Azure, Bedrock, Gemini, and more
- Operator mode: Run investigations on a schedule as a Kubernetes operator
HolmesGPT uses an agentic loop to query live observability data from multiple sources and identify root causes.
HolmesGPT integrates with popular observability and cloud platforms. The following data sources ("toolsets") are built-in. Add your own.
| Data Source | Notes |
|---|---|
| Azure Kubernetes Service cluster and node health diagnostics | |
| Get status, history and manifests and more of apps, projects and clusters | |
AWS |
RDS events, instances, slow query logs, and more (MCP) |
Azure |
Azure resources and diagnostics (MCP) |
Azure SQL |
Database health, performance, connections, and slow queries |
Confluence |
Private runbooks and documentation |
| Retrieve logs for any resource | |
Datadog |
Query logs, metrics, and traces |
Docker |
Get images, logs, events, history and more |
| Query logs, cluster health, shard and index diagnostics | |
| Google Cloud Platform resources (MCP) | |
GitHub |
Repositories, issues, and pull requests (MCP) |
| Query and analyze dashboard configurations and panels | |
Helm |
Release status, chart metadata, and values |
| Public runbooks, community docs etc | |
Kafka |
Fetch metadata, list consumers and topics or find lagging consumer groups |
| Pod logs, K8s events, and resource status (kubectl describe) | |
| Query logs for Kubernetes resources or any query | |
| MariaDB database queries and diagnostics (MCP) | |
| Cluster health, slow queries, and performance diagnostics | |
NewRelic |
Investigate alerts, query tracing data |
| Projects, routes, builds, security context constraints, and deployment configs | |
| Investigate alerts, query metrics and generate PromQL queries | |
RabbitMQ |
Partitions, memory/disk alerts, troubleshoot split-brain scenarios and more |
Robusta |
Multi-cluster monitoring, historical change data, runbooks, PromQL graphs and more |
| Query tables and incident records | |
| Sentry | Error tracking, issues, and performance monitoring (MCP) |
Slab |
Team knowledge base and runbooks on demand |
| Splunk | Log search and analysis (MCP) |
| PostgreSQL, MySQL, ClickHouse, MariaDB, SQL Server, SQLite | |
Tempo |
Fetch trace info, debug issues like high latency in application |
See the full list of built-in toolsets for additional integrations including Cilium, KubeVela, Notion, Prefect, and more.
HolmesGPT can fetch alerts/tickets to investigate from external systems, then write the analysis back to the source or Slack.
| Integration | Status | Notes |
|---|---|---|
| Slack | ✅ | Demo. Available via Robusta.dev (commercial platform) |
| Microsoft Teams | ✅ | Available via Robusta.dev (commercial platform) |
| Prometheus/AlertManager | ✅ | Robusta SaaS or HolmesGPT CLI |
| PagerDuty | ✅ | HolmesGPT CLI only |
| OpsGenie | ✅ | HolmesGPT CLI only |
| Jira | ✅ | HolmesGPT CLI only |
| GitHub | ✅ | HolmesGPT CLI only |
Read the installation documentation to learn how to install HolmesGPT.
Read the LLM Providers documentation to learn how to set up your LLM API key.
See the walkthrough documentation for usage guides, including:
- Interactive mode for asking questions and follow-ups
- Investigating Prometheus alerts
- CI/CD troubleshooting
By design, HolmesGPT has read-only access and respects RBAC permissions. It is safe to run in production environments.
Distributed under the Apache 2.0 License. See LICENSE for more information.
Join our community to discuss the HolmesGPT roadmap and share feedback:
If you have any questions, feel free to message us on HolmesGPT Slack Channel
Please read our CONTRIBUTING.md for guidelines and instructions.
For help, contact us on Slack or ask DeepWiki AI your questions.
Please make sure to follow the CNCF code of conduct - details here.

AWS
Azure
Confluence
Datadog
Docker
GitHub
Helm
Kafka
NewRelic
RabbitMQ
Robusta
Slab
Tempo