Rover: Intelligence, democratized.

Inspiration

For Visually Impaired people accessibility tools are often split across multiple platforms and can feel confusing or disjointed. Even something as basic as getting an email on your phone and then making a related calendar event on your computer can be challenging. We wanted to simplify this space and bring web autonomy to the digital lives of the visually impaired. Our app can do:

Natural phone calls to businesses
iMessage integration without official APIs
Voice-controlled web browsing
Enterprise-grade credential management
Real-time video processing from emerging hardware We built an AI platform that integrates these capabilities into a single system.

What it does

A multi-modal AI assistant that handles tasks across communication, productivity, and automation:

Communication:

Custom iMessage integration: sends/receives iMessages from your phone number via a custom Python bridge
Dual voice agents: two AI agents that make and receive natural phone calls (user agent for incoming, outreach agent for outgoing calls to businesses)
Text messaging: full iMessage/SMS support through our custom provider

Productivity:

Google Calendar, Gmail, Slack, Notion, Microsoft Teams integration via Composio
Voice-controlled browser: complete web automation controlled by voice commands
RAG-powered context: vector database with retrieval-augmented generation for personalized, context-aware responses

Security & Automation:

Enterprise-grade secrets manager: LastPass-level security with SHA-256 encryption, AES-256-GCM at rest, audit logging, and rate limiting
Local LLM domain verification: on-device LLM verifies domain integrity before credential release
Browser automation: Stagehand-powered web automation for ordering, reservations, and research

Hardware Integration:

Beta glasses support: custom scalable video extraction pipeline (no existing libraries) for real-time video processing from beta hardware

How we built it

Architecture:

Orchestrator:_ Central brain on AWS EC2 running Claude 4.5 Sonnet with tool routing
AWS ECS: Containerized services for scalable deployment
Model Context Protocol (MCP): Standardized tool integration framework
Composio Integration: Unified API for 250+ productivity apps

Custom iMessage Provider:

Python scraper monitoring macOS Finder for conversation file changes
Real-time detection of new messages in chat.db
Bridge service forwarding messages to EC2 orchestrator
iCloud sync integration to send replies from your phone number
No official API required

Voice Agents (VAPI):

Two specialized agents:
User Agent: Receives calls, transcribes, routes to orchestrator
Outreach Agent: Makes outbound calls, handles conversations, reports results
Natural language understanding for complex phone interactions
Webhook integration with orchestrator for real-time coordination

Voice-Controlled Browser:

Stagehand AI for vision-based browser automation
Voice command → intent → browser action pipeline
Form filling, navigation, extraction, and purchases
Integration with secrets manager for authentication

Enterprise Secrets Manager:

Encryption: SHA-256 hashing + AES-256-GCM encryption at rest
Storage: PostgreSQL with Prisma ORM
Security: Audit logging, rate limiting, domain verification
Local LLM Verification: On-device model verifies domain integrity before credential release
Import Support: 1Password, LastPass, Bitwarden, Chrome CSV formats

Beta Glasses Video Extraction: Custom video processing pipeline (no existing libraries)

Real-time frame extraction and processing
Scalable architecture for continuous video streams
Integration with orchestrator for AI-powered analysis

Vector Database & RAG:

Vector embeddings for context retrieval
Retrieval-augmented generation for personalized responses
Long-term memory across conversations
Semantic search over user history and preferences

Tech Stack:

Backend: Node.js, TypeScript, Express.js
AI/ML: Claude 4.5 Sonnet, OpenAI (Stagehand), Local LLM (domain verification)
Database: PostgreSQL, Vector DB (embeddings)
Infrastructure: AWS EC2, ECS, Docker
Integrations: VAPI (voice), Composio (productivity), Stagehand (browser)
Security: SHA-256, AES-256-GCM, Prisma ORM
Validation: Zod schemas for type-safe API contracts
Languages: TypeScript, Python (iMessage bridge, video processing)
Protocols: Model Context Protocol (MCP), REST APIs, Webhooks

Challenges we ran into

1) Custom iMessage Integration: **No official API. Built a Python scraper monitoring macOS Finder for chat.db changes, then bridged to iCloud for sending. Reverse-engineered the message format and sync mechanism. **2) Beta Glasses Video Extraction: No libraries available. Built a custom pipeline for real-time video extraction, frame processing, and scalable streaming architecture from scratch. 3) Dual Voice Agent Coordination: Coordinating two agents (incoming/outgoing) with the orchestrator required careful state management and webhook design to prevent conflicts and ensure proper routing. 4) Enterprise-Grade Security: Implementing LastPass-level security (SHA-256, AES-256-GCM, audit trails) while maintaining performance required cryptographic expertise and careful key management. 5) Local LLM Domain Verification: Running an on-device LLM for real-time domain verification before credential release required model optimization and latency reduction. 6) Voice-Controlled Browser: Converting voice commands to precise browser actions required intent classification, context understanding, and error recovery. 7) Vector DB & RAG Integration: Building a RAG system with vector embeddings while maintaining conversation context and ensuring relevant retrieval required careful embedding strategy and similarity search optimization. 8) ECS Deployment: Containerizing all services (orchestrator, voice agents, browser automation, secrets manager) for ECS deployment required Docker optimization, service discovery, and environment variable management.

Accomplishments that we're proud of

** 1) Custom iMessage Provider:** Built a working iMessage integration without official APIs, using Python file monitoring and iCloud sync. ** 2) Dual Voice Agents:** Two AI agents that make and receive natural phone calls, handling complex conversations and multi-step tasks. 3) Beta Glasses Integration: First scalable video extraction pipeline for beta hardware with no existing libraries. ** 4) Voice-Controlled Browser:** Complete web automation controlled entirely by voice commands, from navigation to purchases. ** 5) Enterprise-Grade Secrets Manager:** LastPass-level security with SHA-256, AES-256-GCM encryption, audit logging, and local LLM domain verification. ** 6) RAG System:** Vector database with retrieval-augmented generation for personalized, context-aware AI responses. ** 7) Unified Architecture:** Single orchestrator managing voice, text, browser automation, credentials, and productivity tools through MCP. ** 8) Production-Ready Infrastructure:** ECS-ready design with Docker containers, environment-based configuration, and scalable service architecture. ** 9) Type-Safe APIs:** Zod schemas throughout for runtime validation and type safety. ** 10) Zero-API Integrations: **Built integrations (iMessage, beta glasses) where no official APIs exist.

What we learned

** 1) Reverse Engineering:** Building the iMessage provider required reverse-engineering macOS file structures and iCloud sync mechanisms. ** 2) Custom Video Processing:** Creating a video extraction pipeline from scratch taught us about frame buffering, real-time processing, and scalable streaming architectures. ** 3) Voice AI Coordination:** Managing multiple voice agents with shared state and preventing conflicts required careful webhook design and state machines. 4) Cryptographic Security: Implementing enterprise-grade encryption (SHA-256, AES-256-GCM) required deep understanding of key management, IV generation, and secure storage patterns. ** 5) Local LLM Optimization:** Running LLMs on-device for real-time verification required model quantization, inference optimization, and latency reduction techniques. ** 6) Vector Embeddings:** Building effective RAG systems required understanding embedding models, similarity search algorithms, and context window management. ** 7) Model Context Protocol:** MCP provided a standardized way to expose tools to LLMs, making our architecture more maintainable and extensible. ** 8) ECS Deployment:** Containerizing complex multi-service applications for cloud deployment taught us about service discovery, health checks, and environment variable management. ** 9) Type Safety at Scale:** Using Zod for runtime validation provided type safety benefits even in JavaScript/TypeScript projects, catching errors before they reach production. ** 10) Integration Platform Benefits:** Using Composio reduced boilerplate and handled OAuth complexity, allowing us to focus on core functionality rather than API management.

What's next for Rover

In the future, we want to launch Rover as a fully functioning mobile app on Apple and Android.

Built With

amazon-web-services
claude
composio
css
diarization
elevenlabs
embedding
gemini
javascript
llama
mcp
openai
pinecone
python
rag
shell
stagehand
supabase
swift
typescript
vapi
vector
vlm
voice

Submitted to

TAMUhack 2026
- Winner Best Devpost

Created by

I built the Listener iOS app, integrated it with the orchestrator and Meta Glasses, and produced the demo video end to end.

Mann Bellani
DMirwani21 Mirwani
Asvath Madhan
Arjun Babla

Updates

DMirwani21 Mirwani started this project — Jan 24, 2026 06:14 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.