StitchWitch: AI Surgeon Assistant

High Level AI Architecture
Our LA Hacks Team picture

Inspiration

The inspiration behind StitchWitch stems from the need to address the alarming rate of surgical errors, which amount to over 4,000 incidents annually. Humans tend to make mistakes, and this is unavoidable. Therefore, our team aims to leverage advanced technology to enhance surgical precision and patient safety by utilizing AI in surgical environments to minimize such surgical errors, potentially impacting over 4,000 lives every year.

What it does

StitchWitch utilizes the Gemini Pro AI model, which we aim to integrate with Ray-Ban smart glasses to provide real-time assistance during surgeries. The system analyzes surgical procedures, detects potential errors, and offers immediate feedback to the surgical team, aiming to reduce mistakes and optimize surgical outcomes.

How we built it

Frontend

Utilized Python and the Reflex framework for frontend development
Reflex allowed us to avoid using HTML, CSS, and javascript, allowing a robust application with just one framework
Reflex enabled easy route management for the purposes of our web application

Backend

Leveraged Reflex framework for backend development
Utilized yield statements within event handlers to initiate state changes
Used OpenCV to extract frames from surgical videos for real-time analysis

AI Pipeline

We have 3 AI agents each designed for a specific purpose.

Observer-agent: Observes live video frames to understand what is happening in the scene and describes it in natural language so that other agents can use it's output.
Procedure-agent: This is a chat agent which remembers what has happened uptil now
Alert-agent: This agent knows the standard procedure and parts of anatomy through few shot prompting. And it alerts the user if there is some risk.
Supervisor-agent (optional): At the end of the procedure, this agent reviews the procedure history and compares it with the standard procedure to ensure no step was missed.

Challenges we ran into

Gemini doesn't allow fine-tuning on video data, for now. We had an idea of fine-tuning using Youtube surgery training videos and their captions.
The Gemini vision model doesn't support chatting (i.e. multi-turn conversation), so we designed a walkthrough by using a vision model to observe and a chat text model to maintain history.
Gemini and other models are aligned to not answer questions related to health due to safety concerns. So we searched for ways to turn that check off. We stumbled on the Gemini safety controls. The official documentation for using safety controls didn't work for us. We had to try multiple things until one thing worked.
Accurately simulating a live stream of surgery while parsing it through Gemini for instant feedback due to backend processing times.
Our AI model is best used when integrated with the Ray-Ban meta glasses or similar smart-glass technology that can seamlessly observe surgery without causing a hindrance to surgeons due to its lightweight and small form factor. However, due to the lack of hardware in this hackathon, we were unable to get access to such a smart glass and therefore had to use a live-stream simulation of surgery to test our AI model.

Accomplishments that we're proud of

Multimodal few-shot prompting of vision-language models to answer domain-specific problems.
Simulate a live stream with real-time video analysis using Open CV and Gemini
Familiarizing ourselves with reflex, an upcoming revolutionary full-stack framework.

What we learned

Through this project, we gained valuable insights into AI model deployment and computer vision applications in healthcare. We navigated complex safety protocols and discovered the extensive potential of prompt engineering. Integrating real-time video analysis with Gemini seemed impossible initially but taught us the importance of persistence and innovation at the intersection of technology and medicine. This experience deepened our understanding of how advanced technologies can enhance medical practices, emphasizing interdisciplinary collaboration and pushing conventional boundaries for meaningful innovation in healthcare.

What's next for StitchWitch

Improve the quality of our dataset collection to finetune and prompt models.
Finetune vision-text models such as Gemini-vision to understand the medical procedures better for more accurate observations.
Extend compatibility to smart glasses such as the Ray-Ban Meta glass.
Our goal is to deploy StitchWitch as a transformative tool for surgical teams worldwide, improving patient outcomes and reducing errors.

Built With

asyncio
gemini
opencv
python
reflex

Submitted to

LA Hacks 2024

Created by

I worked on the data pipeline design and AI model. It involved appropriately using multiple AI models to get our desired output.

Abdullah Ashfaq
I worked on the reflex backend, specifically with Computer Vision and parsing in frames into our Gemini model.

Akhil Ram Shankar
James Choi
Chanbin Na