Inspiration Every year, thousands of newborns need emergency intubation — a procedure where a breathing tube must be placed into an airway the size of a pinky finger. It's one of medicine's hardest skills, done under pressure, with no room for error. We wanted to ask: what if AI could be a second set of eyes, guiding the clinician in real time so they never have to look away from the airway?
What We Built NeoGuide is a real-time intubation guidance system designed for neonates (newborns). It watches a laryngoscope camera feed, uses Gemini Vision AI to identify anatomical landmarks frame-by-frame, and delivers spoken alerts through ElevenLabs — hands-free, so the clinician stays fully focused. The dashboard shows depth zone, safety status, a live procedure log, and landmark confidence scores, all calibrated to neonatal anatomy (a baby's trachea is only ~4 cm, vs ~13 cm in adults).
How We Built It React + Vite frontend, no backend. Gemini 2.5 Flash analyzes camera frames via a structured medical prompt and returns JSON with landmark detections. ElevenLabs Turbo v2.5 converts alerts to speech. We built a two-tier voice alert system: a priority queue for normal guidance, and an interrupt path for critical events like esophageal intubation that fires instantly, bypassing everything else.
Challenges Getting Gemini to return reliable, consistent JSON from medical images was tricky — we built four fallback parsing strategies and a majority-vote stabilizer across 3 frames to suppress hallucination flip-flops. The hardest bug: Gemini correctly classifies esophageal views as "no airway visible" (it's not an airway), which meant our danger check was silently skipped. We had to move the esophagus detection outside the image quality guard entirely.
What We Learned Prompt engineering for structured medical output is its own discipline. Real-time AI pipelines need graceful degradation at every layer — bad frames, API latency, ambiguous images. And voice UX for clinical settings is completely different from consumer apps: slower speech, no repetition, and instant interruption for danger events can literally change outcomes.
Log in or sign up for Devpost to join the conversation.