Inspiration
Vision loss is a growing problem due to aging populations and the growing prevalence of vision-threatening diseases in some parts of the world. Those with vision loss or partial vision loss can benefit from getting a better idea of what's happening in their surroundings. This is what we wanted to achieve. We wanted to build an app for those who are visually impaired.
What it does
Aid Eye is a web-based real-time scene description app embedded with Machine Learning models. It allows users to capture a scene with the webcam and describes the scene in one sentence. Our user interface provides interaction feedback with audio (inspired by the Microsoft ease of access function).
How we built it
Aid Eye is built with an image captioning model, an text-to-speech API, and a web framework. The image captioning is implemented with a Tensorflow pre-trained model and is embedded in our web-app with the ability to perform real-time analysis. The text-to-speech API we use is the Google Cloud TTS API, and is used for prompting, user interaction, and scene description. The web framework is built with Flask, with webcam stream, a machine learning model, and text-to-speech API calls integrated.
Challenges we ran into
Our project consists of many components that were not trivial to implement and a lot of the technologies we used in this project are new for us. The image captioning model takes quite amount of time to setup, so does the web camera and TTS. After making individual components work, integrating the whole pipeline is new for all of us as well and we were constrained by computing power. Another thing is that our model was trained on primarily outdoor images, making testing indoor images more difficult.
Accomplishments that we're proud of
We made the whole pipeline work in real-time and is demoable. Considering our lack of experience in many of the technologies we use, we are very proud we have built an AI-embedded app that can benefit many people's lives.
What we learned
We learned about how image captioning work and also how to bring AI research to production. We also learned how to use Google APIs and how to build more complex projects with Flask.
What's next for Aid Eye
We want to make Aid Eye even faster and lighter, running on mobile so that it is more portable for outdoor scene description. Having better computation power would greatly improve the time it takes to load the descriptions and separating where the model and the frontend runs will be a good start (in our opinion).
Built With
- ai
- image-captioning
- machine-learning
- text-to-speech
- web-embedding


Log in or sign up for Devpost to join the conversation.