Splash Screen
User Interface

Reality Copilot - Your Copilot in Real Reality

💡 Motivation

We have GitHub Copilot revolutionizing how we write code. We have Windows Copilot transforming how we interact with our desktop. So why can't we have Reality Copilot - an AI assistant that understands and enhances our physical world in mixed reality?

"Reality Copilot, your copilot in real reality."

What it does

Reality Copilot transforms Meta Quest 3 into an intelligent spatial computing assistant by integrating cutting-edge AI services with real-time camera feeds. Capture your environment, segment objects, generate 3D models, and compose context-aware emails - all in mixed reality.

Key Features

🤖 FastVLM: Real-time image understanding with natural language (self-hosted) [RealityCopilot-Backend/FastVLM]
✂️ SAM3: Advanced object segmentation with text prompts (self-hosted) [RealityCopilot-Backend/SAM3]
🎨 SAM3D: Convert 2D masks to full 3D models instantly (self-hosted) [RealityCopilot-Backend/SAM3D]
🌐 Gemini AI: Commercial cloud AI for contextual assistance and content generation
📧 Smart Email: Context-aware composition with AI-generated content
📹 Hardware Recording: H.264/H.265 video capture using Quest's encoder with dual audio capture (microphone + speaker output)
🎙️ Voice-First Interaction: Hands-free operation with TEN-framework VAD native library (.so) - just speak naturally, no buttons needed
🔍 Google Search: Search the web directly from MR
🎵 Spotify: Listen to music and podcasts in mixed reality
📺 YouTube: Watch videos in your spatial environment

Tech Stack

Unity C#  →  Quest Camera API, TEN VAD (.so), UI/Scene Management
Python    →  FastVLM, SAM3, SAM3D inference servers
Java/C++  →  Hardware video encoding, audio capture, native optimizations
JavaScript→  Three.js 3D viewer, WebView content

Technical Highlights

Multi-language integration: Seamless Unity C# ↔ Python ↔ Java/C++ ↔ JavaScript
On-device VAD: Native library (.so) for local voice detection - no server needed
Hardware acceleration: Native MediaCodec for efficient recording
Zero-copy transfers: Optimized frame pipeline
Android-optimized: UnityWebRequest for APK asset loading
OAuth2 security: Gmail integration with token refresh

How it works

Capture - Grab camera feed with precise timestamps
Process - Send to AI services (FastVLM/SAM3/SAM3D)
Visualize - Display results in 3D space
Interact - Email, save, or manipulate content

Innovation

First MR app to unify commercial AI (Gemini on Cloud) with state-of-the-art open-source AI (SAM3, SAM 3D, and FastVLM on self-hosted server) in a single spatial interface. Features hardware-optimized performance, hands-free voice interaction, and integrated web services (Google Search, Spotify, YouTube).

Performance

72Hz frame rate maintained
<2s AI inference
<2GB memory footprint
1080p@30fps recording
<50ms voice detection latency

Built for Meta Quest 3/3S | Unity 6000.2.6f2 | Python 3.11 | Android SDK

📚 References

This project integrates several cutting-edge open-source projects:

FastVLM - Apple's Fast Vision Language Model: https://github.com/apple/ml-fastvlm
SAM3 - Meta's Segment Anything Model 3: https://github.com/facebookresearch/sam3
SAM 3D Objects - Meta's 3D generation: https://github.com/facebookresearch/sam-3d-objects
Embodiment - Google's embodied AI: https://github.com/google/embardiment
TEN VAD - Voice Activity Detection: https://github.com/TEN-framework/ten-vad
Unity Passthrough Camera API - Meta Quest samples: https://github.com/oculus-samples/Unity-PassthroughCameraApiSamples
RobotVisionUnityPluginQuest - Hardware recording: https://github.com/XR-Robotics/RobotVisionUnityPluginQuest

Reality Copilot, your copilot in real reality.

Made with ❤️ for the future of spatial computing

Built With

fastvlm
gemini
sam3
sam3d
unity

Updates

Liuchuan Yu posted an update — Dec 09, 2025 03:45 PM EST

=============================================================================== REALITY COPILOT

Your Copilot in Real Reality

MOTIVATION

We have GitHub Copilot revolutionizing how we write code. We have Windows Copilot transforming how we interact with our desktop. So why can't we have Reality Copilot - an AI assistant that understands and enhances our physical world in mixed reality?

"Reality Copilot, your copilot in real reality."

WHAT IT DOES

KEY FEATURES

FastVLM: Real-time image understanding with natural language
SAM3: Advanced object segmentation with text prompts
SAM3D: Convert 2D masks to full 3D models instantly
Smart Email: Context-aware composition with AI-generated content
Hardware Recording: H.264/H.265 video with dual audio (mic + speaker output)
Voice-First: TEN VAD native library (.so) - on-device, just speak, no buttons

TECH STACK

Unity C# → Quest Camera API, TEN VAD (.so), UI/Scene Management Python → FastVLM, SAM3, SAM3D inference servers Java/C++ → Hardware video encoding, audio capture, native optimizations JavaScript → Three.js 3D viewer, WebView content

TECHNICAL HIGHLIGHTS

Multi-language integration: Seamless Unity C# ↔ Python ↔ Java/C++ ↔ JavaScript
On-device VAD: Native library (.so) for local voice detection - no server needed
Hardware acceleration: Native MediaCodec for efficient recording
Zero-copy transfers: Optimized frame pipeline
Android-optimized: UnityWebRequest for APK asset loading
OAuth2 security: Gmail integration with token refresh

HOW IT WORKS

CAPTURE - Grab camera feed with precise timestamps
PROCESS - Send to AI services (FastVLM/SAM3/SAM3D)
VISUALIZE- Display results in 3D space
INTERACT - Email, save, or manipulate content

INNOVATION

First Reality Copilot for spatial computing - unifying multiple state-of-the- art AI models (vision, segmentation, 3D generation) with voice-first hands- free interaction and industry-first dual audio capture (mic + speaker) in VR.

PERFORMANCE

72Hz frame rate maintained
<2s AI inference
<50ms voice detection latency
<2GB memory footprint
1080p@30fps recording

REQUIREMENTS

Built for Meta Quest 3/3S Unity 6000.0.38f1 Python 3.8+ Android SDK

WORKFLOW EXAMPLE

User: "Capture" → Camera captures view → Image shown in overlay

User: "SAM3D Local" → Loads 3D model → Displays in space

User: "Email Model" → AI analyzes content → Generates description → Opens email composer

User: Send email → Gmail API sends → Confirmation displayed

AI SERVICES (Python Server)

FastVLM Server - POST /fastvlm/process SAM3 Segmentation - POST /sam3/segment SAM3D Generation - POST /sam3d/generate Health Check - GET /status

Note: VAD runs locally as native library (.so), not as a service

SUPPORTED FORMATS

Images: PNG, JPEG, WebP 3D Models: GLB, GLTF Video: H.264, H.265 (hardware-accelerated) Audio: WAV, MP3, OGG

KEY ACHIEVEMENTS

First Reality Copilot for spatial computing
Multi-modal AI integration in VR
Voice-first hands-free interaction with VAD
Industry-first dual audio capture (mic + speaker)
Hardware-optimized video recording
Cross-platform asset loading for Android
Secure OAuth2 Gmail integration
Real-time 3D model generation from segmentation

FUTURE PLANS

Multi-user collaboration with voice chat
Cloud synchronization
Persistent AR anchors
Custom model training
Real-time translation (text + speech)
Gesture recognition (voice + gesture commands)
Embodied AI integration with robotics

REFERENCES

This project integrates several cutting-edge open-source projects:

FastVLM - Apple's Fast Vision Language Model https://github.com/apple/ml-fastvlm
SAM3 - Meta's Segment Anything Model 3 https://github.com/facebookresearch/sam3
SAM 3D Objects - Meta's 3D object generation https://github.com/facebookresearch/sam-3d-objects
Embodiment - Google's embodied AI research https://github.com/google/embardiment
TEN VAD - Voice Activity Detection framework https://github.com/TEN-framework/ten-vad
Unity Passthrough Camera API Samples - Meta's official Quest camera API https://github.com/oculus-samples/Unity-PassthroughCameraApiSamples
RobotVisionUnityPluginQuest - Hardware-accelerated recording plugin https://github.com/luffy-yu/RobotVisionUnityPluginQuest

=============================================================================== Reality Copilot, your copilot in real reality.

Made with ❤️ for the future of spatial computing

Log in or sign up for Devpost to join the conversation.

Liuchuan Yu started this project — Dec 09, 2025 02:58 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.