About StreamXR

Dynamic LOD for WebXR

Inspiration

Video streaming services like Netflix revolutionized how we consume 2D content - no massive downloads, instant playback on any device, quality that adapts to your connection. But for 3D and XR content? We're still stuck in the dark ages, forcing users to download gigabyte-sized files before experiencing anything.

We were inspired by platforms like Miris and the fundamental question: What if 3D content could stream like video? Just like Netflix made HD video accessible anywhere, StreamXR aims to make immersive spatial experiences instantly accessible on any WebXR device - from iPhones to Quest 3 to Vision Pro - without app installs or massive downloads.

What it does

StreamXR is a real-time 3D streaming platform that intelligently delivers optimized spatial content to any WebXR device:

Adaptive Bitrate Streaming: Like Netflix for 3D - automatically adjusts model quality based on your connection speed (>500 KB/s = high quality, <500 KB/s = low quality)
Foveated Streaming: Sends high-quality assets where you're actually looking (<30° from view center), low-quality elsewhere - massive bandwidth savings without visible quality loss
Dynamic LOD Generation: Drop in a high-quality .glb file and the server automatically generates medium (50% triangles) and low (25% triangles) variants using mesh decimation
True Cross-Platform: Same codebase runs on iOS/Android AR, Meta Quest 3 VR, and Apple Vision Pro
Real-time Multiuser: Multiple users share the same 3D space with synchronized avatars and positions
Interactive Shared Objects: Spawn cubes, spheres, cones and grab/manipulate them in real-time - changes sync across all connected clients with ownership conflict prevention

How we built it

Tech Stack:

Backend: Node.js + Express + WebSocket (ws) for real-time communication
Frontend: Three.js + WebXRManager for 3D rendering and XR sessions
Streaming: Binary GLB streaming over WebSocket with 16KB chunked transfers
LOD Generation: Custom mesh decimation using meshoptimizer and gltf-pipeline
Monitoring: Prometheus metrics + Grafana dashboards for real-time analytics
Deployment: Docker Compose on Lima VM + Cloudflare Tunnel for global HTTPS access

Architecture:

Client connects via WebSocket and sends head-tracking data (position, rotation, quaternion) at 10Hz
Server calculates view frustum and selects appropriate LOD for each object based on:
- Bandwidth (measured from transfer speeds)
- Viewing angle (foveated - high LOD within 30° of gaze)
Binary GLB assets stream in 16KB chunks, reassembled client-side
Room manager synchronizes multiple users with avatar positions broadcast at 20Hz
Object ownership system prevents manipulation conflicts with 5-second auto-release timeout

Key Innovation - Foveated Streaming Without Eye Tracking:

$$\text{angle} = \arccos\left(\frac{\vec{viewDir} \cdot \vec{objectDir}}{|\vec{viewDir}| \cdot |\vec{objectDir}|}\right)$$

If $\text{angle} < 30°$, serve high LOD; otherwise serve low LOD. This works on any device with head tracking - no special eye tracking hardware required.

Challenges we ran into

WebSocket Binary Streaming: Getting reliable chunked binary transfers for GLB files without corruption required careful buffer management and proper ArrayBuffer reassembly on the client
LOD Generation Pipeline: Building a mesh decimation system that preserves visual quality while reducing triangle count by 50-75% involved trial and error with different decimation algorithms
Cross-Platform WebXR Differences: Quest 3, Vision Pro, and phone AR all have different WebXR capabilities - hand tracking vs controllers, immersive-ar vs immersive-vr, different session features
Object Manipulation Sync: Implementing grab-and-move for shared objects with ownership, conflict prevention, and smooth interpolation across network latency was complex
Vision Pro Hand Tracking: Implementing pinch-to-grab gestures using WebXR hand input required understanding joint positions, select events, and smooth quaternion interpolation for natural feel

Accomplishments that we're proud of

7 Implementation Phases Completed: From WebSocket foundation to dynamic LOD generation - the full streaming pipeline works
Foveated Streaming Actually Works: You genuinely cannot perceive the quality difference between center and periphery - dramatic bandwidth savings
Sub-2ms Asset Streaming: Server-side asset delivery is nearly instant
Hand Tracking on Vision Pro: Natural pinch-to-grab gestures with smooth object following and visual feedback (proximity highlighting, grab indication)
True Cross-Platform: Same JavaScript codebase runs on Quest 3 VR, Vision Pro AR, iPhone AR, and desktop browser
Production Deployment: Live at https://streamxr.brad-dougherty.com with monitoring at Grafana dashboard
All Open Web Standards: No proprietary SDKs - pure WebXR, WebSocket, Three.js

What we learned

WebRTC is overkill for this use case: Plain WebSocket with binary transfers is simpler and sufficient for asset streaming - WebRTC's complexity (STUN/TURN, ICE candidates) wasn't necessary
Head tracking is "good enough": Foveated rendering doesn't require expensive eye tracking hardware - head direction provides 80% of the benefit
LOD pre-generation is crucial: Runtime mesh decimation is too slow; pre-generating or caching LOD variants on server startup is essential
Three.js + WebXR is production-ready: The web platform has matured enough to build serious XR applications without native SDKs
Network > GPU: Most XR performance bottlenecks are bandwidth and latency, not rendering - optimize the network first

What's next for StreamXR

Short-term:

Spatial audio streaming with positional sound effects
User-uploaded 3D assets with automatic LOD generation
Improved hand mesh visualization (full hand models instead of spheres)

Medium-term:

SFU (Selective Forwarding Unit) architecture for 100+ concurrent users
AI-assisted LOD generation using learned mesh simplification
Persistent rooms with database storage
CDN integration for global low-latency delivery

Long-term Vision:

Radiance field / 3D Gaussian Splat streaming for photorealistic volumetric content
Full eye tracking integration on Vision Pro for true foveated rendering
Become the "Netflix of spatial content" - a massive library of instantly streamable 3D experiences

Built With

3
android
api
apple
chrome
cloudflare
css
device
docker
dracoloader
express.js
glb
gltf
gltf-pipeline
gltfloader
grafana
hand
html
input
ios
javascript
lima
meshoptimizer
meta
nginx
node.js
pro
prometheus
quest
safari
tailscale
three.js
tunnel
vision
vm
websocket
webxr
ws

Updates

Brad D started this project — Dec 07, 2025 05:16 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.