Project Story

Inspiration

Emergency response systems today are largely reactive. Dispatchers depend on 911 calls, which often arrive late, lack context, or fail to capture the full severity of a situation. In fast-moving emergencies like physical altercations or medical collapses, every second matters — yet critical delays still occur due to fragmented information and manual coordination.

We were inspired by a simple question: What if cities could detect emergencies the moment they happen, instead of waiting for someone to call for help?

With the increasing availability of camera infrastructure and advancements in AI, we saw an opportunity to transform passive video and audio feeds into real-time public safety intelligence, specifically tailored for a city like Indianapolis.

What It Does

Our system is an AI-powered emergency coordination platform that combines computer vision and audio analysis to detect critical incidents in real time.

It can identify events such as:

physical altercations individuals collapsing or lying motionless distress signals like “help” or “call 911” bystander reactions indicating urgency

Instead of producing raw model outputs, the system fuses these signals into a single incident command card, which includes:

a clear summary of what happened severity and confidence levels recommended responders (Police, EMS) suggested hospital destination with ETA reasoning behind each decision

To ensure reliability, the system includes a human-in-the-loop review step, allowing an operator to confirm, escalate, or dismiss incidents.

We also integrated voice alerts using ElevenLabs, enabling the system to announce high-severity incidents in real time — making the platform more actionable and immersive.

How We Built It

We designed the system as a layered pipeline:

  1. Ingestion Layer Video frames from camera feeds (live or prerecorded) Audio input for distress detection Metadata such as camera ID and location
  2. AI Detection Layer

We used computer vision techniques (OpenCV + model-based reasoning) to detect:

motion patterns (rapid movement, collapse, inactivity) multi-person interactions (fights, crowd behavior)

In parallel, we incorporated audio signals (distress phrases, yelling) to enhance detection confidence.

  1. Event Fusion Engine

Instead of generating multiple noisy alerts, we built a fusion layer that combines:

visual signals audio signals temporal patterns

  1. Decision & Recommendation Engine

We implemented rule-based logic to determine:

who should respond (Police, EMS, or both) severity level recommended medical destination

  1. Frontend Dashboard

We built a real-time dashboard that displays:

live incident cards severity indicators responder recommendations hospital suggestions human review controls

  1. Voice Layer (ElevenLabs)

We integrated ElevenLabs text-to-speech to generate:

spoken incident alerts operator feedback (e.g., “Incident confirmed”)

This adds a real-world operational feel to the system.

Challenges We Faced

  1. Balancing accuracy vs. speed

We quickly realized that training a perfect model was not feasible within hackathon time constraints. Instead, we focused on reliable signals and strong system design, combining motion analysis, rules, and multimodal inputs.

  1. Avoiding noisy detections

Raw detections can be messy. We solved this by building an event fusion layer that consolidates multiple signals into one clean incident.

  1. System integration

Coordinating frontend, backend, and detection pipelines was one of the hardest parts. We had to carefully design interfaces so that:

detection outputs could plug into backend logic backend data could cleanly drive the UI

  1. Demo reliability

Live systems are fragile. To ensure a smooth demo, we implemented:

seeded dummy incidents prerecorded fallback scenarios

This guaranteed that the system would always demonstrate correctly.

  1. Realism vs. scope

We wanted to simulate a full emergency ecosystem (police, EMS, hospitals), but had to keep the implementation lightweight. We focused on high-impact features like:

incident command cards responder recommendations hospital suggestions What We Learned System design matters more than raw model accuracy in real-world applications Combining multiple weak signals (vision + audio + motion) can produce strong results Human-in-the-loop is critical for safety-critical AI systems Clear, actionable outputs (not raw predictions) make AI systems usable Demo reliability is just as important as technical correctness in hackathons What’s Next

In the future, we would:

integrate real-time camera streams incorporate live traffic APIs for more accurate routing expand hospital capability matching improve audio detection with more robust models deploy the system in collaboration with local agencies Final Thoughts

Our project transforms passive infrastructure into proactive intelligence.

Instead of waiting for emergencies to be reported, we enable cities like Indianapolis to detect, understand, and respond to incidents in real time — improving coordination, reducing response time, and ultimately saving lives.

Built With

Share this project:

Updates