EchoCases "Every case deserves to be heard and closed"
Inspiration
The clearance rate for violent crime in North America is only about 50%, but for minority and underserved communities, clearance rates are often lower due to case deprioritization and limited investigative resources. As a result, many families are left without answers for years, sometimes decades.
Ordinary citizens, particularly those from marginalized communities are frequently told that law enforcement simply doesn't have the resources to help them. Cases go cold. Files gather dust. Families never get closure.
We built EchoCases to change that.
Our mission is to democratize investigative intelligence, giving community advocates, independent researchers, and under-resourced agencies the same pattern-recognition capabilities that well-funded departments take for granted. By making cold case analysis 10x more efficient, we aim to give voice to the voiceless and bring attention back to the cases that have been forgotten.
What it does
EchoCases is an AI-powered cold case intelligence platform that functions as an investigative pattern-recognition system, analyzing case narratives to surface hidden connections across time and geography.
Its core features include:
Interactive Case Mapping: A Mapbox-powered interface displays cold cases geographically, allowing investigators to visualize spatial patterns and filter by date, category, and location.
Semantic Pattern Matching: Using sentence embeddings and cosine similarity, EchoCases identifies cases with similar narratives, MOs, and circumstances even when they occur in different jurisdictions or use different terminology.
AI-Powered Investigation: Detailed investigative analysis including behavioral patterns, victimology assessment, geographic profiling, temporal analysis, overlooked angles, and actionable next steps.
Crime Series Detection: HDBSCAN clustering automatically identifies potential serial patterns, grouping cases that may be linked but were never connected.
Temporal Animation: An animated timeline reveals how cases unfold over time, exposing escalation patterns and cooling-off periods that might indicate serial behavior
Dual Interface Modes: Switch between Tactical (hacker aesthetic) and Standard (government/professional) themes.
Research & Evidence
EchoCases pattern matching algorithm is built with the support of several behavior and research studies including:
Crime Linkage & Behavioral Similarity: Decades of research demonstrate that offenders exhibit consistent behavioral patterns across crimes:
- Studies by Bennell et al. (2002) and Porter et al. (2014) show that behavioral and modus operandi similarity can reliably link crimes, especially when paired with clustering techniques.
- Reviews in the Journal of Investigative Psychology and Offender Profiling highlight that manual linkage is time-consuming and error-prone, while structured similarity models improve accuracy.
EchoCases approach: Uses semantic sentence embeddings and cosine similarity to identify narrative-level behavioral similarities, even when cases are described using different language or occur in different jurisdictions. Density-based clustering (HDBSCAN) then surfaces potential crime series without requiring predefined assumptions.
Text Mining Police Narratives Police reports are unstructured, inconsistent, and vary widely in quality, especially in older cases. Research in text mining and deep learning (CIKM 2020) shows that semantic NLP models significantly outperform keyword-based methods in detecting hidden relationships between cases.
EchoCases approach: Applies semantic text embeddings to extract latent behavioral signals from narratives, allowing meaningful comparison at scale and enabling pattern detection that human reviewers may miss.
How we built it
EchoCases is a full-stack application built on a technology stack that supports scalable analysis and interactive visualization.
The tech stack
Backend:
- Python
- Flask (RESTful API)
- Sentence-Transformers (
all-MiniLM-L6-v2for semantic embeddings) - scikit-learn (cosine similarity, HDBSCAN clustering)
- Pandas & NumPy (data processing)
- Google Gemini AI (advanced investigative analysis)
Frontend:
- React 18
- Mapbox GL JS (interactive mapping)
- Lucide React (iconography)
- Custom CSS with dual-theme support
ML & Data:
- Semantic sentence embeddings for narrative similarity
- Density-based clustering for crime series detection
- Haversine distance calculations for geographic analysis
- python-dateutil for flexible timestamp parsing
Challenges we ran into
Making AI Actionable: Having a similarity engine isn't enough, investigators need actionable insights. We spent significant time refining AI analysis prompts to generate structured, professional output that follows real investigative frameworks (behavioral analysis, victimology, geographic profiling). When the backend API isn't available, EchoCases falls back to a local analysis generator that extracts patterns directly from case data.
Index Synchronization: A subtle but critical bug emerged where DataFrame indices didn't align with embedding array positions, causing the "self-match" problem where cases matched themselves at 100%. Solving this required careful positional indexing and explicit exclusion logic.
Date Format Flexibility: Real-world case data comes in countless formats. We learned to use flexible parsing (python-dateutil) rather than rigid strptime patterns to handle ISO formats, various separators, and timezone variations.
Accomplishments that we're proud of
Creating a True Investigative Assistant: We successfully built an AI system that goes beyond simple keyword matching. By using semantic embeddings, EchoCases can identify connections that human reviewers might miss such as cases with similar circumstances described in completely different words.
Accessible Design for Underserved Communities: We prioritized making EchoCases usable by community advocates and independent researchers, not just law enforcement. The dual-theme interface (tactical/standard) ensures the tool feels appropriate in any context.
Real Pattern Detection: Our HDBSCAN clustering successfully identifies potential crime series from unlabeled data, automatically grouping cases that share semantic and geographic characteristics.
What we learned
This project was a deep dive into the intersection of NLP, geospatial analysis, and investigative methodology.
Real-world data is messy. Timestamps come in dozens of formats. Coordinates are sometimes swapped. Narratives range from one sentence to multiple paragraphs. Building robust parsing and validation was essential.
Investigators need explanations, not just scores. A 75% similarity score means nothing without context. We learned to extract and highlight shared keywords, calculate geographic and temporal distances, and surface the specific factors driving each match.
Accessibility matters more than features. An unused tool helps no one. We prioritized intuitive design, responsive performance, and clear warnings about the tool's limitations over adding more ML complexity.
What's next for EchoCases
Multi-Database Integration: Connect with NIBRS, ViCAP, NamUs, and state-level cold case databases to dramatically expand the case universe.
Collaborative Features: Enable secure case sharing between agencies, advocates, and researchers while maintaining appropriate access controls..
Community Partnership: Engage directly with victim advocacy groups and minority community organizations to ensure our development aligns with their real-world needs and workflows.
Mobile Access: Develop a mobile application for field investigators and community canvassers.
Built With
- cohere
- css
- flask
- gemini
- hdbscan
- javascript
- mapbox
- numpy
- pandas
- python
- react
- scikit-learn
- sentence-transformers
Log in or sign up for Devpost to join the conversation.