WhisPath
Inspiration
The idea of WhisPath is deeply rooted in addressing the profound challenges faced by visually impaired individuals in navigating their environments. Despite significant advancements in technology, many existing aids still fall short in dynamically informing users about real-time changes in their surroundings. WhisPath is envisioned as a transformative solution that not only enhances traditional navigation aids but also integrates LLM-based artificial intelligence to provide a seamless, interactive navigation experience. This project truly empowers visually impaired people to explore the city area, and helps in day-to-day life. This project is inspired by the potential to significantly enhance the autonomy of visually impaired individuals, allowing them to navigate with confidence and safety through complex environments.
What it does
WhisPath is an innovative application specifically designed to empower visually impaired users by providing auditory guidance about their immediate surroundings. Utilizing the camera on a user’s device, the app captures continuous visual data from the environment. This data is then processed in real-time using a combination of Fetch.ai autonomous agents and Google’s Gemini API. The application performs several critical functions:
- Obstacle Detection: Identifies physical obstacles in the user's path and provides verbal warnings and navigation sense.
- Threat Assessment: Analyzes potential threats in the environment and classifies them according to severity (low, mid, high).
- Environmental Descriptions: Offers detailed descriptions of surroundings, enhancing the user's mental map of the space.
- Dynamic Path Suggestions: Provides suggestions for safe navigation based on real-time environmental data.
How we built it
WhisPath introduces a novel integration of LLM-based AI technology with advanced object recognition and natural language processing. This creates a new paradigm for real-time, interactive guidance systems. Currently, we have developed it as a browser-based React-Python application, which takes image input feed continuously on a certain interval and responds with an AI-generated navigation audio guide.
The application leverages Fetch.ai agents, each programmed to perform specific tasks: detecting obstacles, assessing threats, and describing the environment. These agents work in an automated fashion, allowing them to process data rapidly and independently, yet synchronize efficiently to provide cohesive feedback. The autonomy of these agents enables the system to adapt to dynamic environments, making real-time decisions without central oversight.
Google’s Gemini API enhances the capabilities of these agents by providing state-of-the-art image recognition and natural language processing. This API processes the visual data captured by the user’s device camera, identifies relevant objects and hazards, and translates this information into natural language descriptions and warnings.
The backend of WhisPath, built with Python FastAPI, serves as the communication hub between the Fetch.ai agents and the Gemini API. It processes incoming data from the agents, facilitates API calls, and ensures that the data flow remains smooth and secure. The frontend, developed using React, offers a user-friendly interface that delivers auditory information to the user, allowing for seamless interaction and accessibility.
This technological synergy between AI-powered agents and powerful API capabilities ensures that WhisPath provides reliable, accurate, and instantaneous navigational aid to visually impaired users.
Challenges we ran into
Integrating disparate technologies posed significant challenges, particularly in maintaining real-time data processing capabilities and ensuring seamless communication across cross-agents. These challenges were met by adopting a modular architecture, allowing for incremental development and testing. This approach helped the team to isolate and address issues effectively without disrupting the entire system.
Accomplishments that we're proud of
- Innovative Use of AI: Successfully deploying AI not just as a supplementary technology but as the core mechanism for real-time environmental interaction and navigation.
- Text-To-Speech module: We developed a custom TTS interface to convert the LLM-generated text to an audio guide to assist conveniently assist users.
- Automated workflow: The application consists of automated agent workflow that triggers on hazard detection and successfully notifies the user.
What's next for WhisPath
We plan to provide future hardware and mobile software application support with the existing WhisPath codebase for improving user input photo resolution and user experience for the visually impaired. Additionally, we seek to improve the accuracy of our audio guide's traversal suggestion and threat identification and prevention workflows through improved LLM models, parameters, and optimization of agent services.


Log in or sign up for Devpost to join the conversation.