Inspiration
For Visually Impaired people accessibility tools are often split across multiple platforms and can feel confusing or disjointed. Even something as basic as getting an email on your phone and then making a related calendar event on your computer can be challenging. We wanted to simplify this space and bring web autonomy to the digital lives of the visually impaired. Our app can do:
- Natural phone calls to businesses
- iMessage integration without official APIs
- Voice-controlled web browsing
- Enterprise-grade credential management
- Real-time video processing from emerging hardware We built an AI platform that integrates these capabilities into a single system.
What it does
A multi-modal AI assistant that handles tasks across communication, productivity, and automation:
Communication:
- Custom iMessage integration: sends/receives iMessages from your phone number via a custom Python bridge
- Dual voice agents: two AI agents that make and receive natural phone calls (user agent for incoming, outreach agent for outgoing calls to businesses)
- Text messaging: full iMessage/SMS support through our custom provider
Productivity:
- Google Calendar, Gmail, Slack, Notion, Microsoft Teams integration via Composio
- Voice-controlled browser: complete web automation controlled by voice commands
- RAG-powered context: vector database with retrieval-augmented generation for personalized, context-aware responses
Security & Automation:
- Enterprise-grade secrets manager: LastPass-level security with SHA-256 encryption, AES-256-GCM at rest, audit logging, and rate limiting
- Local LLM domain verification: on-device LLM verifies domain integrity before credential release
- Browser automation: Stagehand-powered web automation for ordering, reservations, and research
Hardware Integration:
- Beta glasses support: custom scalable video extraction pipeline (no existing libraries) for real-time video processing from beta hardware
How we built it
Architecture:
- Orchestrator:_ Central brain on AWS EC2 running Claude 4.5 Sonnet with tool routing
- AWS ECS: Containerized services for scalable deployment
- Model Context Protocol (MCP): Standardized tool integration framework
- Composio Integration: Unified API for 250+ productivity apps
Custom iMessage Provider:
- Python scraper monitoring macOS Finder for conversation file changes
- Real-time detection of new messages in chat.db
- Bridge service forwarding messages to EC2 orchestrator
- iCloud sync integration to send replies from your phone number
- No official API required
Voice Agents (VAPI):
- Two specialized agents:
- User Agent: Receives calls, transcribes, routes to orchestrator
- Outreach Agent: Makes outbound calls, handles conversations, reports results
- Natural language understanding for complex phone interactions
- Webhook integration with orchestrator for real-time coordination
Voice-Controlled Browser:
- Stagehand AI for vision-based browser automation
- Voice command → intent → browser action pipeline
- Form filling, navigation, extraction, and purchases
- Integration with secrets manager for authentication
Enterprise Secrets Manager:
- Encryption: SHA-256 hashing + AES-256-GCM encryption at rest
- Storage: PostgreSQL with Prisma ORM
- Security: Audit logging, rate limiting, domain verification
- Local LLM Verification: On-device model verifies domain integrity before credential release
- Import Support: 1Password, LastPass, Bitwarden, Chrome CSV formats
Beta Glasses Video Extraction: Custom video processing pipeline (no existing libraries)
- Real-time frame extraction and processing
- Scalable architecture for continuous video streams
- Integration with orchestrator for AI-powered analysis
Vector Database & RAG:
- Vector embeddings for context retrieval
- Retrieval-augmented generation for personalized responses
- Long-term memory across conversations
- Semantic search over user history and preferences
Tech Stack:
- Backend: Node.js, TypeScript, Express.js
- AI/ML: Claude 4.5 Sonnet, OpenAI (Stagehand), Local LLM (domain verification)
- Database: PostgreSQL, Vector DB (embeddings)
- Infrastructure: AWS EC2, ECS, Docker
- Integrations: VAPI (voice), Composio (productivity), Stagehand (browser)
- Security: SHA-256, AES-256-GCM, Prisma ORM
- Validation: Zod schemas for type-safe API contracts
- Languages: TypeScript, Python (iMessage bridge, video processing)
- Protocols: Model Context Protocol (MCP), REST APIs, Webhooks
Challenges we ran into
1) Custom iMessage Integration: **No official API. Built a Python scraper monitoring macOS Finder for chat.db changes, then bridged to iCloud for sending. Reverse-engineered the message format and sync mechanism. **2) Beta Glasses Video Extraction: No libraries available. Built a custom pipeline for real-time video extraction, frame processing, and scalable streaming architecture from scratch. 3) Dual Voice Agent Coordination: Coordinating two agents (incoming/outgoing) with the orchestrator required careful state management and webhook design to prevent conflicts and ensure proper routing. 4) Enterprise-Grade Security: Implementing LastPass-level security (SHA-256, AES-256-GCM, audit trails) while maintaining performance required cryptographic expertise and careful key management. 5) Local LLM Domain Verification: Running an on-device LLM for real-time domain verification before credential release required model optimization and latency reduction. 6) Voice-Controlled Browser: Converting voice commands to precise browser actions required intent classification, context understanding, and error recovery. 7) Vector DB & RAG Integration: Building a RAG system with vector embeddings while maintaining conversation context and ensuring relevant retrieval required careful embedding strategy and similarity search optimization. 8) ECS Deployment: Containerizing all services (orchestrator, voice agents, browser automation, secrets manager) for ECS deployment required Docker optimization, service discovery, and environment variable management.
Accomplishments that we're proud of
** 1) Custom iMessage Provider:** Built a working iMessage integration without official APIs, using Python file monitoring and iCloud sync. ** 2) Dual Voice Agents:** Two AI agents that make and receive natural phone calls, handling complex conversations and multi-step tasks. 3) Beta Glasses Integration: First scalable video extraction pipeline for beta hardware with no existing libraries. ** 4) Voice-Controlled Browser:** Complete web automation controlled entirely by voice commands, from navigation to purchases. ** 5) Enterprise-Grade Secrets Manager:** LastPass-level security with SHA-256, AES-256-GCM encryption, audit logging, and local LLM domain verification. ** 6) RAG System:** Vector database with retrieval-augmented generation for personalized, context-aware AI responses. ** 7) Unified Architecture:** Single orchestrator managing voice, text, browser automation, credentials, and productivity tools through MCP. ** 8) Production-Ready Infrastructure:** ECS-ready design with Docker containers, environment-based configuration, and scalable service architecture. ** 9) Type-Safe APIs:** Zod schemas throughout for runtime validation and type safety. ** 10) Zero-API Integrations: **Built integrations (iMessage, beta glasses) where no official APIs exist.
What we learned
** 1) Reverse Engineering:** Building the iMessage provider required reverse-engineering macOS file structures and iCloud sync mechanisms. ** 2) Custom Video Processing:** Creating a video extraction pipeline from scratch taught us about frame buffering, real-time processing, and scalable streaming architectures. ** 3) Voice AI Coordination:** Managing multiple voice agents with shared state and preventing conflicts required careful webhook design and state machines. 4) Cryptographic Security: Implementing enterprise-grade encryption (SHA-256, AES-256-GCM) required deep understanding of key management, IV generation, and secure storage patterns. ** 5) Local LLM Optimization:** Running LLMs on-device for real-time verification required model quantization, inference optimization, and latency reduction techniques. ** 6) Vector Embeddings:** Building effective RAG systems required understanding embedding models, similarity search algorithms, and context window management. ** 7) Model Context Protocol:** MCP provided a standardized way to expose tools to LLMs, making our architecture more maintainable and extensible. ** 8) ECS Deployment:** Containerizing complex multi-service applications for cloud deployment taught us about service discovery, health checks, and environment variable management. ** 9) Type Safety at Scale:** Using Zod for runtime validation provided type safety benefits even in JavaScript/TypeScript projects, catching errors before they reach production. ** 10) Integration Platform Benefits:** Using Composio reduced boilerplate and handled OAuth complexity, allowing us to focus on core functionality rather than API management.
What's next for Rover
In the future, we want to launch Rover as a fully functioning mobile app on Apple and Android.
Built With
- amazon-web-services
- claude
- composio
- css
- diarization
- elevenlabs
- embedding
- gemini
- javascript
- llama
- mcp
- openai
- pinecone
- python
- rag
- shell
- stagehand
- supabase
- swift
- typescript
- vapi
- vector
- vlm
- voice
Log in or sign up for Devpost to join the conversation.