I.R.I.S | Devpost

Inspiration

We were inspired by a simple yet profound realization: 250 million deaf-blind people worldwide face barriers that technology could solve, but solutions cost over $10,000 and require complex setup. Traditional accessibility tools treat vision and hearing loss separately, missing the opportunity for intelligent, unified assistance.

Our team realized that Google's Gemini 3 with its multimodal capabilities could democratize accessibility—enabling deaf-blind individuals to navigate independently, access information in real-time, and participate fully in digital society. We asked ourselves: "What if we could provide intelligent accessibility for the cost of a Raspberry Pi and free API access?"

What it does

AccessiView is an AI-powered accessibility companion that uses Gemini 3's advanced multimodal intelligence to provide real-time assistance for deaf-blind users through multiple sensory channels:

Vision Understanding: Analyzes camera input to describe environments in detail Recognizes text, faces, and objects with contextual awareness Identifies hazards and safety concerns immediately Provides spatial descriptions (left, right, distance, depth)

Intelligent Dialogue: Maintains conversation context across multiple turns Answers complex questions about surroundings Provides step-by-step navigation guidance Adapts responses to user needs and preferences

Multimodal Output: Converts descriptions to haptic feedback (tactile sensations via wearable) Generates audio descriptions with natural pacing Produces Braille-compatible text output Text-to-speech integration for audio-first users

Real-World Features: Emergency hazard detection (stairs, moving objects, vehicles) Location awareness and navigation assistance Document and sign reading with context Sound event interpretation (who's talking? traffic patterns? alarms?).

How we built it

How We Built It Technology Stack:

Raspberry Pi 4B - Edge device for real-time processing

Gemini 3 API - Core intelligence (vision, reasoning, dialogue)

Python - Primary development language

FastAPI - Backend for API orchestration

Haptic Feedback Module - Wearable vibration output

Text-to-Speech - Audio generation (offline pyttsx3)

Architecture:

text Raspberry Pi Camera → Gemini 3 Vision Analysis → Intelligent Response ↓ ├─ Haptic Feedback (vibration patterns) ├─ Audio Output (TTS) ├─ Braille Display (text conversion) └─ Real-time Dialogue System Integration Approach:

We replaced fragmented Azure services with Gemini 3's unified API:

Previous (Azure) New (Gemini 3) Advantage Azure Computer Vision Gemini 3 Vision Contextual understanding, not just tags Azure Speech Recognition Gemini 3 Audio Integrated with reasoning, better accuracy Azure Language Understanding Gemini 3 Reasoning Multi-turn context, true comprehension Multiple API calls Single unified API Lower latency, reduced complexity Key Build Decisions:

Single Unified API: Gemini 3's multimodal capabilities eliminated the need for 4+ separate services, reducing API calls and latency Edge Processing: Raspberry Pi handles initial image capture and feedback generation Context Preservation: Multi-turn dialogue maintains user context across interactions Safety-First: Immediate hazard detection prioritizes user safety over other responses Accessibility in Development: Our codebase includes comments explaining Gemini integration for community contributions

Challenges we ran into

Challenge 1: Real-Time Performance Problem: Debian-based systems on Pi had limited resources; multi-service Azure setup was too slow Solution: Migrated to Gemini 3's single API with Flash model (10-50ms response) for immediate feedback; used Pro model for complex queries

Challenge 2: Accurate Scene Description Problem: Generic image tags weren't useful for navigation; needed contextual, actionable descriptions Solution: Gemini 3's advanced reasoning enabled intelligent interpretation of scenes with spatial awareness and hazard prioritization

Challenge 3: Context Awareness Problem: Fragmented Azure services couldn't maintain conversation context across interactions Solution: Gemini 3's multi-turn dialogue natively preserves context, enabling nuanced interactions

Challenge 4: Accessibility Throughout Problem: Our accessibility tool needed to be accessible during development Solution: Structured code with clear Gemini integration comments; comprehensive documentation for community involvement

Challenge 5: Cost at Scale Problem: Multiple premium Azure services made per-user cost prohibitive for low-income regions Solution: Free tier of Gemini 3 covers entire hackathon; production scalable at $0.075-1.50 per 1M tokens (vs. $100-500/month Azure)

Accomplishments We're Proud Of Gemini 3 Integration: Seamlessly replaced 4+ Azure services with one intelligent, multimodal API Real-Time Safety: Hazard detection responds in <100ms, critical for user safety Multimodal Output: Users choose preferred interaction mode (haptic, audio, text, Braille) Raspberry Pi Deployment: Full-featured system runs on $35 hardware + free API tier Team Collaboration: 4-person team successfully pivoted from Azure → Gemini 3 in early development Accessibility Practices: Our project demonstrates accessibility standards internally (captions, documentation, inclusive design)

What we learned

Multimodal AI Changes Everything: Trying to integrate 4 separate services taught us that unified multimodal APIs (like Gemini 3) are superior for complex problems like accessibility

Context Matters More Than Data: We learned that maintaining conversation context is more valuable than raw data; Gemini 3's ability to understand implications and relationships transformed our approach

Edge + Cloud Hybrid: Raspberry Pi handles feedback generation + safety checks; cloud (Gemini 3) handles reasoning. Best of both worlds

Accessibility Isn't Afterthought: Building for deaf-blind users from day one shaped better product decisions than retrofit accessibility

Free Tier Is Powerful: Gemini 3's free tier covering our entire hackathon demo showed us that accessibility tech doesn't need to be expensive

Documentation Saves Weeks: Clear explanations of Gemini integration helped new team members onboard quickly and avoided implementation mistakes

What's next for I.R.I.S

Immediate (Next 3 months): User testing with deaf-blind community partners Refinement of haptic feedback patterns for intuitive communication Integration with Braille displays and screen readers Field testing in real environments (streets, malls, public transit)

Short-term (6-12 months): Mobile app version using Gemini 3 API on smartphones Language localization (Urdu, Arabic, Spanish, Mandarin) Partnership with accessibility organizations for distribution Open-source community version for NGOs and schools Long-term (1-2 years): Wearable integration (smart glasses, haptic gloves) Offline-capable model for unreliable connectivity regions Integration with smart city infrastructure for navigation Licensing model for institutions, schools, rehabilitation centers $1M+ impact: Reach 50,000+ users across Pakistan and South Asia. Vision: AccessiView using Gemini 3 will become the gold standard for affordable, intelligent accessibility—proving that cutting-edge AI can democratize technology and transform lives.

Built With

debian
dotenv
fastapi
flask
gemini-3-api
google-ai-sdk
numpy
pillow-(pil)
python
pyttsx3-(text-to-speech)
raspberry-pi-4b
raspberry-pi-os
rpi.gpio-(haptic-feedback)

Submitted to

Gemini 3 Hackathon

Created by

I integrated Google’s Gemini API to power the core intelligence of our solution. By leveraging Gemini’s multimodal capabilities, I engineered the backend to process visual and sensory data into accurate, natural language descriptions, ensuring real-time assistance for our users.

Shahzeb Ali
Shahrukh Baloch
M.Kh!z@r
Shaikh Muhammed

Updates

Shahrukh Baloch started this project — Jan 24, 2026 05:39 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.