Reality Copilot - Your Copilot in Real Reality
π‘ Motivation
We have GitHub Copilot revolutionizing how we write code. We have Windows Copilot transforming how we interact with our desktop. So why can't we have Reality Copilot - an AI assistant that understands and enhances our physical world in mixed reality?
"Reality Copilot, your copilot in real reality."
What it does
Reality Copilot transforms Meta Quest 3 into an intelligent spatial computing assistant by integrating cutting-edge AI services with real-time camera feeds. Capture your environment, segment objects, generate 3D models, and compose context-aware emails - all in mixed reality.
Key Features
- π€ FastVLM: Real-time image understanding with natural language (self-hosted) [RealityCopilot-Backend/FastVLM]
- βοΈ SAM3: Advanced object segmentation with text prompts (self-hosted) [RealityCopilot-Backend/SAM3]
- π¨ SAM3D: Convert 2D masks to full 3D models instantly (self-hosted) [RealityCopilot-Backend/SAM3D]
- π Gemini AI: Commercial cloud AI for contextual assistance and content generation
- π§ Smart Email: Context-aware composition with AI-generated content
- πΉ Hardware Recording: H.264/H.265 video capture using Quest's encoder with dual audio capture (microphone + speaker output)
- ποΈ Voice-First Interaction: Hands-free operation with TEN-framework VAD native library (.so) - just speak naturally, no buttons needed
- π Google Search: Search the web directly from MR
- π΅ Spotify: Listen to music and podcasts in mixed reality
- πΊ YouTube: Watch videos in your spatial environment
Tech Stack
Unity C# β Quest Camera API, TEN VAD (.so), UI/Scene Management
Python β FastVLM, SAM3, SAM3D inference servers
Java/C++ β Hardware video encoding, audio capture, native optimizations
JavaScriptβ Three.js 3D viewer, WebView content
Technical Highlights
- Multi-language integration: Seamless Unity C# β Python β Java/C++ β JavaScript
- On-device VAD: Native library (.so) for local voice detection - no server needed
- Hardware acceleration: Native MediaCodec for efficient recording
- Zero-copy transfers: Optimized frame pipeline
- Android-optimized: UnityWebRequest for APK asset loading
- OAuth2 security: Gmail integration with token refresh
How it works
- Capture - Grab camera feed with precise timestamps
- Process - Send to AI services (FastVLM/SAM3/SAM3D)
- Visualize - Display results in 3D space
- Interact - Email, save, or manipulate content
Innovation
First MR app to unify commercial AI (Gemini on Cloud) with state-of-the-art open-source AI (SAM3, SAM 3D, and FastVLM on self-hosted server) in a single spatial interface. Features hardware-optimized performance, hands-free voice interaction, and integrated web services (Google Search, Spotify, YouTube).
Performance
- 72Hz frame rate maintained
- <2s AI inference
- <2GB memory footprint
- 1080p@30fps recording
- <50ms voice detection latency
Built for Meta Quest 3/3S | Unity 6000.2.6f2 | Python 3.11 | Android SDK
π References
This project integrates several cutting-edge open-source projects:
- FastVLM - Apple's Fast Vision Language Model: https://github.com/apple/ml-fastvlm
- SAM3 - Meta's Segment Anything Model 3: https://github.com/facebookresearch/sam3
- SAM 3D Objects - Meta's 3D generation: https://github.com/facebookresearch/sam-3d-objects
- Embodiment - Google's embodied AI: https://github.com/google/embardiment
- TEN VAD - Voice Activity Detection: https://github.com/TEN-framework/ten-vad
- Unity Passthrough Camera API - Meta Quest samples: https://github.com/oculus-samples/Unity-PassthroughCameraApiSamples
- RobotVisionUnityPluginQuest - Hardware recording: https://github.com/XR-Robotics/RobotVisionUnityPluginQuest
Reality Copilot, your copilot in real reality.
Made with β€οΈ for the future of spatial computing


Log in or sign up for Devpost to join the conversation.