OCULIS | Devpost

Inspiration

The inspiration for this project came from a deep desire to make technology more inclusive. We wanted to create a solution that could empower visually impaired people, giving them greater independence and a sense of security in their daily lives. This project became more than just a technical challenge; it became a mission to make the world more accessible.

What it does

Oculis is designed to assist visually impaired individuals in navigating their surroundings and interacting with the world around them through the use of advanced AI technologies. By leveraging a combination of computer vision, natural language processing, and text-to-speech, the product provides real-time auditory feedback to users, helping them understand their environment and make informed decisions.

How we built it

The project was built using a combination of advanced AI models and a gesture-driven frontend. Here’s a step-by-step overview:

Planning and Design: We began by researching the specific needs of visually impaired individuals, including the most common challenges they face. This research informed the design of a gesture-based UI that would be intuitive and easy to use without relying on visual feedback.
Backend Development: The backend was developed using FastAPI. This choice allowed for the rapid development of RESTful APIs to handle model inference requests. I integrated models for semantic segmentation, object detection (YOLO), and language translation (SeaLion LLM).
Frontend Development: The frontend was built using Next.JS, leveraging libraries for gesture detection. This made it possible to implement swipe, tap, and hold gestures that trigger various functionalities like switching cameras, capturing screenshots, and replaying audio outputs.
Integration: The semantic segmentation and YOLO models were integrated with the LLM to provide context-aware descriptions of the environment. These descriptions were then converted to speech using a TTS model, providing users with real-time auditory feedback.

Challenges we ran into

Some of the most important lessons included:

Gesture Detection in Frontend Development: Developing an intuitive, gesture-based interface proved to be a non-trivial task highlighting the importance of UX design, especially for users who can’t rely on visual cues. From this, we learned how to implement swipe, tap, and hold gestures effectively using tools like different in React.
Reducing Latency in AI Models: Integrating multiple AI models—such as semantic segmentation and TTS—required optimizing performance to reduce latency. We gained valuable insights into model optimization, efficient data handling, and asynchronous processing to ensure a seamless user experience which was quite a challenge.
Human-Centered Design: We learned how crucial it was to design with empathy, focusing on the needs of users who rely on sound rather than sight, ensuring that every interaction is smooth and accessible, making it more challenging than developing the typical web app.

Accomplishments that we're proud of

Empowering the Visually Impaired: One of our proudest accomplishments is the creation of a tool that has the potential to significantly enhance the lives of visually impaired individuals. By integrating cutting-edge technologies like computer vision and natural language processing, we’ve developed a product that can help users navigate their surroundings with confidence and independence.
Seamless User Experience: We’re proud of how we’ve managed to design a user interface that’s both intuitive and accessible. The gesture-based controls ensure that users can easily interact with the app, regardless of their level of experience with technology.
Technical Innovations: From reducing the latency in real-time processing to successfully combining multiple AI models into a cohesive system, we’ve overcome numerous technical challenges. This has resulted in a robust, scalable product that performs well even in complex, real-world environments.

What we learned

Importance of Accessibility: Throughout the development process, we gained a deeper understanding of the unique challenges faced by visually impaired individuals. This insight has driven us to focus on accessibility at every stage of development, ensuring that our product is not only functional but truly beneficial to its users.
Balancing Innovation with Usability: We learned how to balance cutting-edge technology with the need for simplicity and ease of use. While it was tempting to include every possible feature, we found that focusing on core functionalities that directly address users’ needs was the key to creating a successful product.
Technical Problem-Solving: We faced several technical hurdles, particularly in reducing latency and detecting gestures accurately. Through trial and error, we honed our skills in optimizing model performance and fine-tuning the user interface to provide the best possible experience.

What's next for OCULIS

With the right partners in social good, we aim to turn OCULIS into a fully-fledged, real-world application. Our vision is to collaborate with organizations that support visually impaired communities, to refine and deploy this technology on a larger scale.

Moving forward, we plan to integrate additional features, such as enhanced object recognition, real-time navigation assistance, and support for more languages. We’re also exploring the potential of integrating haptic feedback to provide even more detailed and nuanced information to users.