Inspiration
Arguments are very common in our every day relationships! What if there was a funny way to defuse arguments so that nobody is hurt? Introducing...Snarkify! ๐กโก
What it does
Snarkify is an innovative augmented reality application built for Snap's Spectacles platform that combines real-time AI conversation with 3D object generation with face tracking abilities. The project demonstrates how to integrate multiple AI services within an AR environment, creating an immersive experience where users can interact with AI assistants through voice and generate 3D objects in their physical space.
Technologies used
- Real-time AI Conversation: Connects to Google's Gemini Live API for streaming audio conversations
- 3D Object Generation: Uses Snap's Snap3D service to generate interactive 3D models from text prompts
- Voice Recognition: Implements automatic speech recognition (ASR) for hands-free interaction
- Spatial UI: Features a floating orb interface that can be positioned in 3D space or screen space
- Multimodal Input: Supports both audio and camera input for rich AI interactions
How We Built It
Architecture & Technology Stack
The project is built on several key technologies:
- Lens Studio 5.12.1: Snap's AR development platform
- TypeScript: Primary programming language for logic and components
- Remote Service Gateway: Snap's cloud service integration system
- Spectacles Interaction Kit (SIK): UI and interaction framework
Key Components
GeminiAssistant.ts - AI Brain
- Establishes WebSocket connection to Gemini Live API
- Handles real-time audio streaming (16kHz input, 24kHz output)
- Processes function calls for 3D generation
- Manages conversation flow with custom system instructions
- Supports both audio and text-only modes
AIAssistantUIBridge.ts - Integration Hub
- Connects the AI assistant to the user interface
- Coordinates between voice input, AI processing, and 3D generation
- Auto-starts the AI session on app launch
- Manages the flow between different system components
SphereController.ts - Spatial UI
- Creates a floating orb interface that follows the user
- Supports both world-space and screen-space positioning
- Displays AI responses and user speech captions
- Handles hand tracking and spatial interactions
Snap3DInteractableFactory.ts - 3D Generation
- Interfaces with Snap's 3D generation service
- Creates interactive 3D objects that users can manipulate
- Manages the generation pipeline from text prompt to 3D model
- Supports mesh refinement and vertex coloring options
ASRQueryController.ts - Voice Input
- Implements speech-to-text functionality
- Provides visual feedback during voice recording
- Handles voice query processing with configurable accuracy modes
Challenges We Ran Into
Technical Integration Challenges
- Real-time Audio Processing: Synchronizing 16kHz microphone input with 24kHz audio output while maintaining low latency for natural conversation flow
- WebSocket State Management: Managing complex WebSocket connections with proper error handling, reconnection logic, and state synchronization between multiple AI services
- Spatial UI Positioning: Creating a responsive UI that transitions smoothly between hand-tracked, world-space, and screen-space modes while maintaining user context
- Cross-Service Communication: Coordinating between Gemini Live's function calling system and Snap3D's asynchronous generation pipeline ### Platform-Specific Constraints
- Spectacles Hardware Limitations: Working within the computational and memory constraints of AR glasses
- Network Dependency: Ensuring graceful degradation when internet connectivity is poor or intermittent
- Audio Feedback Prevention: Preventing audio loops in development while maintaining natural conversation flow
Accomplishments that we're proud of
Innovation in AR-AI Integration
- Seamless Multimodal Experience: Successfully created a natural conversation flow where users can speak to AI and see their ideas materialize as 3D objects in real space
- Real-time Function Calling: Implemented sophisticated function calling between Gemini Live and Snap3D, allowing the AI to generate 3D content based on conversation context
- Adaptive UI System: Built a spatial interface that intelligently switches between different interaction modes based on user context and device capabilities
Technical Achievements
- Custom System Instructions: Developed creative AI prompts that make the assistant listen to conversations and generate contextual 3D objects (like generating a clown wig when someone is called a clown)
- Robust Audio Pipeline: Implemented a complete audio processing system with Base64 encoding, PCM16 conversion, and dynamic audio output management
- Interactive 3D Objects: Created a complete pipeline from text prompt to manipulatable 3D objects that users can move, scale, and interact with in AR space ### User Experience Design
- Intuitive Spatial Interaction: Designed natural hand-based interactions that feel native to AR environments
- Visual Feedback Systems: Implemented comprehensive visual indicators for AI processing states, voice recording, and 3D generation progress
- Graceful Error Handling: Built resilient systems that provide clear feedback when services fail or connectivity issues occur
What We Learned
AR Development Insights
- Spatial Computing Paradigms: Gained deep understanding of how UI/UX principles translate to 3D space and the importance of maintaining spatial context
- Performance Optimization: Learned to balance rich AI functionality with the constraints of mobile AR hardware
- Cross-Platform Considerations: Understanding how to develop for both Lens Studio preview and actual Spectacles hardware ### AI Integration Patterns
- Real-time AI Conversations: Mastered the complexities of maintaining natural conversation flow with streaming AI models
- Function Calling Architecture: Developed patterns for reliable AI-to-service communication with proper error handling and status reporting
- Multimodal AI Design: Learned to coordinate multiple input modalities (voice, camera, text) for rich AI interactions
What's next?
Enhanced AI Capabilities
- Multi-AI Support: Integrate additional AI models (OpenAI, Claude, etc.) with seamless switching
- Persistent Conversations: Add conversation history and context retention across sessions
- AI Vision Integration: Enable the AI to see and comment on the user's environment through camera feed
- Emotional Intelligence: Add sentiment analysis and emotional responses to conversations
Advanced 3D Features
- Physics Integration: Add realistic physics simulation to generated 3D objects 2, Animation Generation: Allow AI to create animated 3D content, not just static models
- Gesture Recognition: Add hand gesture controls for manipulating objects and controlling the AI
- Eye Tracking Integration: Use gaze for more natural UI interactions and AI attention
Built With
- figma
- gemini
- lensstudio
- snapchat
- spectacles
- typescript

Log in or sign up for Devpost to join the conversation.