Inspiration

One night while brushing my teeth, I had a simple but important realization: almost 8 billion people brush their teeth every day, yet hardly anyone tracks how well they do it. The solutions that exist today often rely on expensive, vendor-locked hardware that puts them out of reach for most people.

I wanted to create something more accessible - a platform that anyone could use with just their phone to track their brushing habits in real time. Toothsy lets users see their brushing habits in real time and even gives providers a way to view brushing history for better dental care.

What it does

Toothsy uses real-time computer vision and machine learning to track your brushing technique. It detects your toothbrush, hand movements, and mouth position, then feeds this information into a neural network that predicts where in your mouth you’re brushing. The results are displayed on an interactive 3D model of teeth that highlights the areas you’ve covered - and the spots you’ve missed!

How I built it

I began by fine-tuning a YOLOv11 object detection model on a public toothbrush dataset, augmenting the data and training both the small and nano variants on cloud GPUs to improve accuracy. To capture hand orientation and gestures, I used Google’s Mediapipe, which outputs 19 hand landmarks, and then simplified the data into just a few key points like the wrist, middle, and fingertip positions that could be fed into a classifier. For detecting the mouth, Mediapipe originally worked well, but its accuracy dropped significantly when the user’s face was partially blocked by a toothbrush or hand. To solve this, I switched to Moondream2b, a lightweight vision language model that could handle occlusions, identify if the user’s mouth was open or closed, and return bounding boxes around the mouth region.

Using these three systems together, I built a dataset of around 400 annotated points where each row included numerical positions of the toothbrush, hand, and mouth, along with a brushing zone label like “top left,” “bottom right,” or “middle.” I trained a multi-layer perceptron to map the positional inputs to brushing zones, achieving over 90% validation accuracy.

On the frontend, I used Next.js and Three.js to display a custom Blender-made 3D tooth model, which is split into separate parts so each section can change color dynamically based on brushing activity. The backend was built with Python to handle inference, and the app communicates with it through simple API calls.

Challenges I ran into

One of the hardest problems I ran into was accurately tracking the mouth when it was partially blocked by hands or a toothbrush. Mediapipe face landmarks worked beautifully when the face was unobstructed, but its accuracy collapsed under occlusion. Switching to a vision language model solved the problem, but it came at a cost: inference speed dropped from about 25 frames per second to just 2. To balance accuracy with performance, I designed the system so Moondream inference runs every 10 frames while YOLO and Mediapipe results are still rendered in real time.

Beyond that, I had little prior experience training or deploying my own models, so I had to learn PyTorch by trial and error as I experimented with different training strategies. Finally, stitching together three different computer vision pipelines and ensuring they all synchronized properly was a challenging but rewarding puzzle.

Accomplishments that I'm proud of

I’m most proud that I was able to actually build the app I envisioned at the start. Toothsy is a fully functional system that runs locally on my GPU with no reliance on external cloud APIs, which means it can easily be deployed on servers in the future. I’m also particularly happy with the visualization component: seeing the 3D teeth light up dynamically as you brush makes the app feel real and fun, while also providing valuable feedback. And from a technical standpoint, training my own dataset and reaching over 90% validation accuracy on the multi-layer perceptron was a major milestone.

What I learned

I learned how to fine-tune and deploy my own object detection models, how to design and train a neural network from scratch using data I collected myself, and how to combine multiple computer vision and machine learning systems into one inference pipeline. I also learned how to integrate Three.js into a frontend for interactive, real-time visualization, which gave the app its most compelling user-facing feature. Most importantly, I learned how to piece all these moving parts together into something cohesive that actually works.

What's next for Toothsy

I plan to fine-tune the toothbrush detection models further and expand the dataset for the brushing classifier to make it cleaner and more robust. I also want to extend the detection system beyond brushing to include flossing, giving a more complete picture of daily dental health. On the infrastructure side, deploying the inference pipeline on cloud GPUs would allow Toothsy to scale and reach more users. Ultimately, my vision is for Toothsy to grow into a one-stop platform for monitoring and improving dental health, making oral care smarter, more accessible, and maybe even a little fun.

Built With

Share this project:

Updates