Inspiration Disease outbreaks in India often go undetected until they escalate, due to delayed reporting and fragmented information. We wanted to build a system that empowers public health officials with real‑time insights by turning scattered news into actionable intelligence.
What it does The Public Health News Tracker scrapes health articles from major Indian sources, extracts disease mentions using NLP, maps them to cities and states, and applies ML clustering to identify hotspots. A live dashboard visualizes outbreaks with severity scores, color‑coded maps, and trend charts.
How we built it Data Collection: RSS feeds from Indian news outlets stored in Delta Lake.
NLP Extraction: Bilingual disease dictionary (English + Hindi) with confidence scoring.
ML Clustering: Severity algorithm combining frequency, recency, and baseline risk.
Dashboard: Interactive visualization with filters, charts, and real‑time updates.
Challenges we ran into Handling bilingual text extraction across diverse sources.
Mapping unstructured location mentions to GPS coordinates.
Designing a severity scoring system that balances accuracy and simplicity.
Accomplishments that we're proud of Built a working pipeline from raw news to live dashboard.
Identified real clusters like tuberculosis and influenza hotspots.
Delivered updates every five minutes for near real‑time awareness.
What we learned We deepened our skills in NLP, geospatial mapping, and ML clustering, while learning how to integrate bilingual data streams into a unified system.
What's next for Health Tracker Dashboard We plan to expand coverage to more regional languages, integrate social media signals, and collaborate with public health agencies to pilot the system in real‑world outbreak monitoring.
Built With
- apis
- aws/gcp-for-scalable-compute-databases:-delta-lake-tables-for-raw-and-processed-data-storage-apis:-rss-feeds-from-indian-news-outlets
- dash
- dashboard
- for
- geocoding
- gps
- interactive
- languages:-python
- mapping
- matlab-frameworks-&-libraries:-pyspark
- matplotlib/plotly-(for-visualization)-platforms-&-cloud-services:-databricks-(delta-lake
- mlflow)
- nltk-(for-nlp)
- pandas
- scikit?learn
- sql
- streamlit/plotly
- tools:
Log in or sign up for Devpost to join the conversation.