Reality Copilot - Your Copilot in Real Reality

πŸ’‘ Motivation

We have GitHub Copilot revolutionizing how we write code. We have Windows Copilot transforming how we interact with our desktop. So why can't we have Reality Copilot - an AI assistant that understands and enhances our physical world in mixed reality?

"Reality Copilot, your copilot in real reality."

What it does

Reality Copilot transforms Meta Quest 3 into an intelligent spatial computing assistant by integrating cutting-edge AI services with real-time camera feeds. Capture your environment, segment objects, generate 3D models, and compose context-aware emails - all in mixed reality.

Key Features

  • πŸ€– FastVLM: Real-time image understanding with natural language (self-hosted) [RealityCopilot-Backend/FastVLM]
  • βœ‚οΈ SAM3: Advanced object segmentation with text prompts (self-hosted) [RealityCopilot-Backend/SAM3]
  • 🎨 SAM3D: Convert 2D masks to full 3D models instantly (self-hosted) [RealityCopilot-Backend/SAM3D]
  • 🌐 Gemini AI: Commercial cloud AI for contextual assistance and content generation
  • πŸ“§ Smart Email: Context-aware composition with AI-generated content
  • πŸ“Ή Hardware Recording: H.264/H.265 video capture using Quest's encoder with dual audio capture (microphone + speaker output)
  • πŸŽ™οΈ Voice-First Interaction: Hands-free operation with TEN-framework VAD native library (.so) - just speak naturally, no buttons needed
  • πŸ” Google Search: Search the web directly from MR
  • 🎡 Spotify: Listen to music and podcasts in mixed reality
  • πŸ“Ί YouTube: Watch videos in your spatial environment

Tech Stack

Unity C#  β†’  Quest Camera API, TEN VAD (.so), UI/Scene Management
Python    β†’  FastVLM, SAM3, SAM3D inference servers
Java/C++  β†’  Hardware video encoding, audio capture, native optimizations
JavaScript→  Three.js 3D viewer, WebView content

Technical Highlights

  • Multi-language integration: Seamless Unity C# ↔ Python ↔ Java/C++ ↔ JavaScript
  • On-device VAD: Native library (.so) for local voice detection - no server needed
  • Hardware acceleration: Native MediaCodec for efficient recording
  • Zero-copy transfers: Optimized frame pipeline
  • Android-optimized: UnityWebRequest for APK asset loading
  • OAuth2 security: Gmail integration with token refresh

How it works

  1. Capture - Grab camera feed with precise timestamps
  2. Process - Send to AI services (FastVLM/SAM3/SAM3D)
  3. Visualize - Display results in 3D space
  4. Interact - Email, save, or manipulate content

Innovation

First MR app to unify commercial AI (Gemini on Cloud) with state-of-the-art open-source AI (SAM3, SAM 3D, and FastVLM on self-hosted server) in a single spatial interface. Features hardware-optimized performance, hands-free voice interaction, and integrated web services (Google Search, Spotify, YouTube).

Performance

  • 72Hz frame rate maintained
  • <2s AI inference
  • <2GB memory footprint
  • 1080p@30fps recording
  • <50ms voice detection latency

Built for Meta Quest 3/3S | Unity 6000.2.6f2 | Python 3.11 | Android SDK

πŸ“š References

This project integrates several cutting-edge open-source projects:


Reality Copilot, your copilot in real reality.

Made with ❀️ for the future of spatial computing

Built With

Share this project:

Updates

posted an update

=============================================================================== REALITY COPILOT

Your Copilot in Real Reality

MOTIVATION

We have GitHub Copilot revolutionizing how we write code. We have Windows Copilot transforming how we interact with our desktop. So why can't we have Reality Copilot - an AI assistant that understands and enhances our physical world in mixed reality?

"Reality Copilot, your copilot in real reality."

WHAT IT DOES

Reality Copilot transforms Meta Quest 3 into an intelligent spatial computing assistant by integrating cutting-edge AI services with real-time camera feeds. Capture your environment, segment objects, generate 3D models, and compose context-aware emails - all in mixed reality.

KEY FEATURES

  • FastVLM: Real-time image understanding with natural language
  • SAM3: Advanced object segmentation with text prompts
  • SAM3D: Convert 2D masks to full 3D models instantly
  • Smart Email: Context-aware composition with AI-generated content
  • Hardware Recording: H.264/H.265 video with dual audio (mic + speaker output)
  • Voice-First: TEN VAD native library (.so) - on-device, just speak, no buttons

TECH STACK

Unity C# β†’ Quest Camera API, TEN VAD (.so), UI/Scene Management Python β†’ FastVLM, SAM3, SAM3D inference servers Java/C++ β†’ Hardware video encoding, audio capture, native optimizations JavaScript β†’ Three.js 3D viewer, WebView content

TECHNICAL HIGHLIGHTS

  • Multi-language integration: Seamless Unity C# ↔ Python ↔ Java/C++ ↔ JavaScript
  • On-device VAD: Native library (.so) for local voice detection - no server needed
  • Hardware acceleration: Native MediaCodec for efficient recording
  • Zero-copy transfers: Optimized frame pipeline
  • Android-optimized: UnityWebRequest for APK asset loading
  • OAuth2 security: Gmail integration with token refresh

HOW IT WORKS

  1. CAPTURE - Grab camera feed with precise timestamps
  2. PROCESS - Send to AI services (FastVLM/SAM3/SAM3D)
  3. VISUALIZE- Display results in 3D space
  4. INTERACT - Email, save, or manipulate content

INNOVATION

First Reality Copilot for spatial computing - unifying multiple state-of-the- art AI models (vision, segmentation, 3D generation) with voice-first hands- free interaction and industry-first dual audio capture (mic + speaker) in VR.

PERFORMANCE

  • 72Hz frame rate maintained
  • <2s AI inference
  • <50ms voice detection latency
  • <2GB memory footprint
  • 1080p@30fps recording

REQUIREMENTS

Built for Meta Quest 3/3S Unity 6000.0.38f1 Python 3.8+ Android SDK

WORKFLOW EXAMPLE

User: "Capture" β†’ Camera captures view β†’ Image shown in overlay

User: "SAM3D Local" β†’ Loads 3D model β†’ Displays in space

User: "Email Model" β†’ AI analyzes content β†’ Generates description β†’ Opens email composer

User: Send email β†’ Gmail API sends β†’ Confirmation displayed

AI SERVICES (Python Server)

FastVLM Server - POST /fastvlm/process SAM3 Segmentation - POST /sam3/segment SAM3D Generation - POST /sam3d/generate Health Check - GET /status

Note: VAD runs locally as native library (.so), not as a service

SUPPORTED FORMATS

Images: PNG, JPEG, WebP 3D Models: GLB, GLTF Video: H.264, H.265 (hardware-accelerated) Audio: WAV, MP3, OGG

KEY ACHIEVEMENTS

  1. First Reality Copilot for spatial computing
  2. Multi-modal AI integration in VR
  3. Voice-first hands-free interaction with VAD
  4. Industry-first dual audio capture (mic + speaker)
  5. Hardware-optimized video recording
  6. Cross-platform asset loading for Android
  7. Secure OAuth2 Gmail integration
  8. Real-time 3D model generation from segmentation

FUTURE PLANS

  • Multi-user collaboration with voice chat
  • Cloud synchronization
  • Persistent AR anchors
  • Custom model training
  • Real-time translation (text + speech)
  • Gesture recognition (voice + gesture commands)
  • Embodied AI integration with robotics

REFERENCES

This project integrates several cutting-edge open-source projects:

=============================================================================== Reality Copilot, your copilot in real reality.

Made with ❀️ for the future of spatial computing

Log in or sign up for Devpost to join the conversation.