Inspiration

The 2026 World Cup brings 1.5 million fans to US stadiums over six weeks. Crowd safety failures don't announce themselves — they build silently across multiple signals before any single threshold trips. The Astroworld tragedy unfolded in 11 minutes. The 2022 Qatar crowd crush happened at a gate. The Seoul Itaewon disaster compressed in a narrow street.

Every one of these had early signatures: traffic backing up on approach roads, sentiment shifting negative on social feeds, density compressing at chokepoints. No single signal was catastrophic. The correlation was.

The question that drove CityShield: what if an AI agent could watch all three streams simultaneously, reason about the joint picture, and alert operators before the crowd arrives — not after?


The problem it solves

Venue operators today monitor crowd safety through siloed systems — a traffic dashboard here, a CCTV feed there, a social media alert buried in email. No system fuses these signals in real time and reasons about what they mean together.

CityShield fills that gap. It ingests crowd density (from computer vision processing real camera feeds), traffic flow, and public sentiment into Elasticsearch, and has Gemini reason across all three streams to form a situational assessment — then sequence a graded operator response.

The key insight: a crowd surge coinciding with gridlocked approach roads 45 minutes before kickoff is a different situation than either signal alone. That cross-stream correlation is what Gemini evaluates — not a threshold on a single number.


What it does

CityShield is a real-time situational awareness agent for large public events.

Signal ingestion:

  • Crowd density estimated from real gate camera footage using computer vision (OpenCV optical flow analysis), written to Elasticsearch every 10 seconds — Gate A uses real CV estimation, other zones use simulated sensors
  • Traffic flow and public sentiment fed continuously via synthetic simulation of pre-match arrival conditions
  • All signals stored in Elasticsearch with full time-series history

Agent pipeline (Google ADK + Gemini):

  • Correlation agent: Gemini reads the last 5-minute window from all three Elasticsearch streams via the Elastic MCP server and reasons about the joint signal picture
  • Response planner: given the assessment, Gemini retrieves comparable past incidents and sequences a graded response — open lanes, stage stewards, hold transit, issue PA announcement with drafted script
  • Incident writer: Gemini composes a plain-language operator briefing

Operator console:

  • Interactive stadium bird's-eye map — click any zone to assess it
  • Three live gauges updating every 30 seconds from Elasticsearch
  • Real drone footage of stadium approach with CV density overlay
  • Full response plan with IMMEDIATE/STAGED action priority
  • Human approval gate — nothing external fires without operator confirmation
  • Signals tab: 30-minute historical charts + direct Kibana dashboard access

Elastic integration:

  • All signal data stored and queried from Elasticsearch
  • Agent reads through Elastic MCP server using ES|QL tools defined in Kibana Agent Builder
  • Incidents written back to cityshield-alerts index via Elastic Workflow tools
  • Kibana dashboards show live operational picture for the full venue

How I built it

Google ADK orchestrates three sequential Gemini agents per assessment cycle: correlation (reads Elasticsearch via Elastic MCP → forms joint assessment), response planner (sequences actions from past incidents), and incident writer (drafts the operator packet).

Elastic Agent Builder defines the tools: ES|QL queries wrapping crowd/traffic/sentiment stream reads, and Elastic Workflow tools for writing incidents back. These are exposed via the built-in MCP server at {KIBANA_URL}/api/agent_builder/mcp. Gemini calls them like any other MCP tools — no hardcoded API endpoints.

Computer vision pipeline uses OpenCV to process drone footage of stadium gate approaches, estimate crowd density and headcount per frame, and write readings to Elasticsearch at 10-second intervals.

FastAPI + React + Tailwind for the operator console. Cloud Run + Firebase Hosting for deployment.


Key finding

The most important thing CityShield demonstrated is how Gemini handles missing signal data. When crowd density sensors were unavailable at Gate A but traffic was gridlocked, Gemini's assessment was:

"The unusually low crowd density at Gate A is likely because people are still stuck in traffic. When vehicles clear, Gate A will see a sudden arrival surge. The current traffic gridlock is the warning — act now."

This reasoning — inferring a future crowd risk from current traffic data — is not something a threshold-based alert system produces. It requires understanding the causal relationship between traffic clearance and crowd arrival patterns. That's what makes this genuinely agentic rather than a monitoring pipeline with alerts bolted on.


Challenges

Elastic MCP architecture: The original plan assumed a Python MCP client with custom endpoints. The actual Elastic Agent Builder MCP server works completely differently — you define tools in Kibana's UI as ES|QL queries or Elastic Workflows, and they're automatically exposed via the built-in MCP endpoint. Once I understood this, the integration became cleaner and more powerful than the original design.

Multi-signal correlation vs single-signal alerting: Early versions had Gemini treating each stream independently. The key prompt engineering breakthrough was forcing joint reasoning: "a density spike that coincides with a sentiment drop and a transit surge is a different situation than any one alone." Making this explicit in the system prompt transformed the assessment quality.

Computer vision on constrained infrastructure: Processing video frames on Cloud Run with CPU-only OpenCV required careful frame sampling (every 10th frame, 10-second write intervals) to stay within Cloud Run's compute budget while still producing meaningful real-time density readings.

Honest scope: CityShield does not dispatch EMS, call 911, or reroute city traffic. These are legally gated capabilities that cannot be demonstrated in a hackathon context. The system provides operational intelligence to venue staff. This constraint is stated explicitly in the console UI (SYNTHETIC · SIMULATED labels on non-CV zones) and README. Naming the limits builds credibility rather than undermining it.


What's next

The architecture generalizes to any large public gathering — concerts, transit hubs, festivals, city events. The three-stream fusion pattern (physical density + traffic + sentiment) applies wherever crowd crush risk exists alongside approach infrastructure.

The Elastic MCP integration proved particularly powerful for a use case like this: the agent doesn't just query data, it uses Elastic Workflows to write incidents back, creating a closed loop where every assessment becomes a permanent, queryable record in the same system that fed the data. That memory layer is what separates a one-time alert from an operationally intelligent system.

Built With

Share this project:

Updates