Dangerous test photo subject.
Call received on a device triggered by fetchAI threat detection agent.
Day-to-day navigation image subject.
Technology Breackdown
Current browser based ReactJS application interaction.

WhisPath

Inspiration

The idea of WhisPath is deeply rooted in addressing the profound challenges faced by visually impaired individuals in navigating their environments. Despite significant advancements in technology, many existing aids still fall short in dynamically informing users about real-time changes in their surroundings. WhisPath is envisioned as a transformative solution that not only enhances traditional navigation aids but also integrates LLM-based artificial intelligence to provide a seamless, interactive navigation experience. This project truly empowers visually impaired people to explore the city area, and helps in day-to-day life. This project is inspired by the potential to significantly enhance the autonomy of visually impaired individuals, allowing them to navigate with confidence and safety through complex environments.

What it does

WhisPath is an innovative application specifically designed to empower visually impaired users by providing auditory guidance about their immediate surroundings. Utilizing the camera on a user’s device, the app captures continuous visual data from the environment. This data is then processed in real-time using a combination of Fetch.ai autonomous agents and Google’s Gemini API. The application performs several critical functions:

Obstacle Detection: Identifies physical obstacles in the user's path and provides verbal warnings and navigation sense.
Threat Assessment: Analyzes potential threats in the environment and classifies them according to severity (low, mid, high).
Environmental Descriptions: Offers detailed descriptions of surroundings, enhancing the user's mental map of the space.
Dynamic Path Suggestions: Provides suggestions for safe navigation based on real-time environmental data.

How we built it

WhisPath introduces a novel integration of LLM-based AI technology with advanced object recognition and natural language processing. This creates a new paradigm for real-time, interactive guidance systems. Currently, we have developed it as a browser-based React-Python application, which takes image input feed continuously on a certain interval and responds with an AI-generated navigation audio guide.

The application leverages Fetch.ai agents, each programmed to perform specific tasks: detecting obstacles, assessing threats, and describing the environment. These agents work in an automated fashion, allowing them to process data rapidly and independently, yet synchronize efficiently to provide cohesive feedback. The autonomy of these agents enables the system to adapt to dynamic environments, making real-time decisions without central oversight.

Google’s Gemini API enhances the capabilities of these agents by providing state-of-the-art image recognition and natural language processing. This API processes the visual data captured by the user’s device camera, identifies relevant objects and hazards, and translates this information into natural language descriptions and warnings.

The backend of WhisPath, built with Python FastAPI, serves as the communication hub between the Fetch.ai agents and the Gemini API. It processes incoming data from the agents, facilitates API calls, and ensures that the data flow remains smooth and secure. The frontend, developed using React, offers a user-friendly interface that delivers auditory information to the user, allowing for seamless interaction and accessibility.

This technological synergy between AI-powered agents and powerful API capabilities ensures that WhisPath provides reliable, accurate, and instantaneous navigational aid to visually impaired users.

Challenges we ran into

Integrating disparate technologies posed significant challenges, particularly in maintaining real-time data processing capabilities and ensuring seamless communication across cross-agents. These challenges were met by adopting a modular architecture, allowing for incremental development and testing. This approach helped the team to isolate and address issues effectively without disrupting the entire system.

Accomplishments that we're proud of

Innovative Use of AI: Successfully deploying AI not just as a supplementary technology but as the core mechanism for real-time environmental interaction and navigation.
Text-To-Speech module: We developed a custom TTS interface to convert the LLM-generated text to an audio guide to assist conveniently assist users.
Automated workflow: The application consists of automated agent workflow that triggers on hazard detection and successfully notifies the user.

What's next for WhisPath

We plan to provide future hardware and mobile software application support with the existing WhisPath codebase for improving user input photo resolution and user experience for the visually impaired. Additionally, we seek to improve the accuracy of our audio guide's traversal suggestion and threat identification and prevention workflows through improved LLM models, parameters, and optimization of agent services.

Built With

ai-agent
fastapi
fetch.ai
gemeniapi
github
https
json
llm
python
react
text-to-speech
twilioapi

Submitted to

LA Hacks 2024
- Winner Best Touch Grass Hack

Created by

Utkarsh Shah
Let's connect, and share ideas and experiences - https://www.linkedin.com/in/utk-shah/
Terin Ambat
Chaitanya Ravindra Gawande
3 years Experience | Software Engineer | Backend Development | Proficient in Java, Spring, SQL, Kafka, AWS, Distributed Systems
Joel Sojan