Inspiration
Annotating medical imaging is critical for training, analysis, and collaboration, but static annotations lose alignment when anatomy or cameras move. We wanted to create annotations that stay aligned with anatomical features, making video review more accurate and reliable for clinicians, researchers and surgeons.
What it does
MTASololu processes uploaded medical videos and generates motion-tracked annotations that stay fixed relative to the anatomy. As structures move throughout the video, the annotations follow, enabling precise reference points for analysis, training, or collaborative review.
How we built it
We extract video frames and use OpenCV to detect key features. Motion is tracked via optical flow, and a lightweight neural network refines annotation alignment across frames. The system outputs the video with all annotations correctly positioned throughout.
Challenges we ran into
- Processing noisy or low-contrast medical videos
- Maintaining high accuracy while keeping processing time reasonable
- Supporting multiple imaging modalities (ultrasound, echocardiography, laparoscopy)
- Ensuring annotations remain consistent even with camera motion
Accomplishments that we're proud of
- Developed a hybrid CV + ML pipeline that works across modalities
- Created annotations that stay aligned frame-to-frame without manual adjustment
- Built a scalable workflow for batch processing uploaded videos
What we learned
- Combining classical computer vision and machine learning can create robust tracking
- Preprocessing and video quality greatly affect annotation accuracy
- Careful system design is required to generalize across different medical imaging types
What's next for MTASololu
- Support real-time streaming for live procedures
- Integrate with HoloXR for collaborative annotation and review
- Expand training datasets to improve neural network tracking accuracy
- Add interactive editing tools for clinicians to adjust annotations when needed


Log in or sign up for Devpost to join the conversation.