Inspiration
The inspiration behind StitchWitch stems from the need to address the alarming rate of surgical errors, which amount to over 4,000 incidents annually. Humans tend to make mistakes, and this is unavoidable. Therefore, our team aims to leverage advanced technology to enhance surgical precision and patient safety by utilizing AI in surgical environments to minimize such surgical errors, potentially impacting over 4,000 lives every year.
What it does
StitchWitch utilizes the Gemini Pro AI model, which we aim to integrate with Ray-Ban smart glasses to provide real-time assistance during surgeries. The system analyzes surgical procedures, detects potential errors, and offers immediate feedback to the surgical team, aiming to reduce mistakes and optimize surgical outcomes.
How we built it
Frontend
- Utilized Python and the Reflex framework for frontend development
- Reflex allowed us to avoid using HTML, CSS, and javascript, allowing a robust application with just one framework
- Reflex enabled easy route management for the purposes of our web application
Backend
- Leveraged Reflex framework for backend development
- Utilized yield statements within event handlers to initiate state changes
- Used OpenCV to extract frames from surgical videos for real-time analysis
AI Pipeline
We have 3 AI agents each designed for a specific purpose.
- Observer-agent: Observes live video frames to understand what is happening in the scene and describes it in natural language so that other agents can use it's output.
- Procedure-agent: This is a chat agent which remembers what has happened uptil now
- Alert-agent: This agent knows the standard procedure and parts of anatomy through few shot prompting. And it alerts the user if there is some risk.
- Supervisor-agent (optional): At the end of the procedure, this agent reviews the procedure history and compares it with the standard procedure to ensure no step was missed.
Challenges we ran into
- Gemini doesn't allow fine-tuning on video data, for now. We had an idea of fine-tuning using Youtube surgery training videos and their captions.
- The Gemini vision model doesn't support chatting (i.e. multi-turn conversation), so we designed a walkthrough by using a vision model to observe and a chat text model to maintain history.
- Gemini and other models are aligned to not answer questions related to health due to safety concerns. So we searched for ways to turn that check off. We stumbled on the Gemini safety controls. The official documentation for using safety controls didn't work for us. We had to try multiple things until one thing worked.
- Accurately simulating a live stream of surgery while parsing it through Gemini for instant feedback due to backend processing times.
- Our AI model is best used when integrated with the Ray-Ban meta glasses or similar smart-glass technology that can seamlessly observe surgery without causing a hindrance to surgeons due to its lightweight and small form factor. However, due to the lack of hardware in this hackathon, we were unable to get access to such a smart glass and therefore had to use a live-stream simulation of surgery to test our AI model.
Accomplishments that we're proud of
- Multimodal few-shot prompting of vision-language models to answer domain-specific problems.
- Simulate a live stream with real-time video analysis using Open CV and Gemini
- Familiarizing ourselves with reflex, an upcoming revolutionary full-stack framework.
What we learned
Through this project, we gained valuable insights into AI model deployment and computer vision applications in healthcare. We navigated complex safety protocols and discovered the extensive potential of prompt engineering. Integrating real-time video analysis with Gemini seemed impossible initially but taught us the importance of persistence and innovation at the intersection of technology and medicine. This experience deepened our understanding of how advanced technologies can enhance medical practices, emphasizing interdisciplinary collaboration and pushing conventional boundaries for meaningful innovation in healthcare.
What's next for StitchWitch
- Improve the quality of our dataset collection to finetune and prompt models.
- Finetune vision-text models such as Gemini-vision to understand the medical procedures better for more accurate observations.
- Extend compatibility to smart glasses such as the Ray-Ban Meta glass.
- Our goal is to deploy StitchWitch as a transformative tool for surgical teams worldwide, improving patient outcomes and reducing errors.
Log in or sign up for Devpost to join the conversation.