Image of the design process within Lens Studio

The AR Pose Reference Generator

A Hack the North 2025 Project Story

Inspiration

As digital artists, we constantly struggle with finding the perfect reference poses for our drawings. Traditional reference photos are static, limited in variety, and often don't match the exact perspective or pose we envision in our minds. We wanted to break free from these constraints and create something that could bring any pose idea directly into our field of vision.

The inspiration came from the realization that artists need dynamic, customizable references that can adapt to their creative vision in real-time. Instead of searching through hundreds of static reference images, what if we could simply speak our vision into existence and interact with it directly in augmented reality?

This led us to envision a system where artists could:

Generate 3D models from voice descriptions
Automatically rig them for pose manipulation
Control them with intuitive hand gestures in AR
Create the perfect reference pose for any drawing project

What it does

The AR Pose Reference Generator transforms artistic vision into interactive 3D reality. Users can speak any description—from "a human in a dynamic fighting pose" to "a spider crawling on a wall"—and watch as their words materialize into a fully rigged, poseable 3D model in augmented reality.

The system enables users to:

Voice-to-3D Generation: Convert any text prompt into a 3D mesh using AI
Automatic Rigging: Transform static models into poseable characters with bone structures
AR Interaction: Manipulate models using hand gestures and spatial controls
Real-time Posing: Create dynamic reference poses for drawing and artistic work

Essentially, it brings anything from your mind into the real world as an interactive, poseable 3D reference that you can position exactly as needed for your artistic vision.

How we built it

We constructed a sophisticated 4-component pipeline that seamlessly integrates multiple cutting-edge technologies:

Component 1: Mesh Generation (`MeshGeneration/`)

Lens Studio (v5.10.0+) for AR development
Snap Spectacles (2024 model) for hardware platform
Snap3D API via Remote Service Gateway for text-to-3D generation
TypeScript scripting for real-time processing
Gemini Live and OpenAI Realtime for voice interaction

Component 2: Backend Storage (`Backend/`)

Node.js with Express.js for API server
AWS S3 for scalable file storage
AWS DynamoDB for metadata tracking
UUID system for unique asset identification

Component 3: 3D Model Rigging (`Rigging/`)

Python 3.8+ with trimesh library for 3D processing
Meshy AI API for automatic bone structure generation
Node.js wrapper for seamless integration
FBX/GLB format conversion and optimization

Component 4: AR Rendering (`as/`)

Spectacles Interaction Kit for hand tracking
FBX/GLB 3D model rendering
Joint-based interaction system for pose control

Integration Flow

Voice Input → Snap3D Generation → AWS Storage → Meshy AI Rigging → AR Import → Hand Control

The entire pipeline operates asynchronously, allowing users to generate multiple models while previous ones are being processed and rigged.

Challenges we ran into

Integration Complexity

The biggest challenge was integrating all the different segments of the code from various development environments and API versions. Each component used different technologies (TypeScript in Lens Studio, Node.js for backend, Python for rigging) with their own version requirements and compatibility constraints.

Documentation Gaps

Snap Spectacles documentation was sparse, especially for newer features like Remote Service Gateway and the 2024 hardware model. We spent significant time reverse-engineering API behaviors and debugging integration issues with limited official guidance.

API Synchronization

Coordinating between multiple external APIs (Snap3D, Meshy AI, AWS services) required careful error handling and retry logic. Each service had different rate limits, response formats, and authentication mechanisms that needed to be harmonized.

Real-time Processing

Managing the asynchronous nature of 3D generation and rigging while maintaining a responsive AR experience proved challenging. Users needed immediate feedback while background processes handled heavy computational tasks.

Cross-Platform Compatibility

Ensuring the system worked consistently across different development environments (macOS, Windows, various Python/Node versions) required extensive testing and environment-specific configurations.

Accomplishments that we're proud of

We successfully developed the entire pipeline, albeit not fully connected, and built out all core functionalities. This represents a significant achievement because:

Complete End-to-End System: Every component from voice input to AR interaction is functional
AI Integration Mastery: Successfully integrated multiple AI services (Snap3D, Meshy AI, Gemini, OpenAI) into a cohesive workflow
Cloud Architecture: Built a robust backend system with AWS integration for scalable file processing
AR Innovation: Created an intuitive hand-gesture control system for 3D model manipulation
Modular Design: Each component can operate independently, making the system maintainable and extensible

The fact that we can generate 3D models from voice, automatically rig them, and control them in AR represents a breakthrough in creative technology that we're extremely proud to have achieved.

What we learned

Project Organization

We learned to modularize our coding and organize our time well. Breaking the project into distinct, independent components allowed team members to work in parallel and made debugging much more manageable.

API Integration Strategies

How to handle asynchronous processing across multiple services
The importance of proper error handling and retry mechanisms
Best practices for managing API rate limits and authentication

AR Development

The complexities of developing for cutting-edge hardware like Snap Spectacles
How to optimize 3D models for real-time AR rendering
The challenges of hand tracking and gesture recognition

Cloud Architecture

Designing scalable file storage and processing systems
Implementing secure upload/download mechanisms
Database design for tracking complex processing workflows

Team Collaboration

Effective communication across different technical domains
Time management for hackathon constraints
Balancing feature development with integration testing

What's next for The AR Pose Reference Generator

Immediate Improvements

Complete Pipeline Integration: Fully connect all components for seamless end-to-end operation
Enhanced UI/UX: Improve the AR interaction experience with better visual feedback
Performance Optimization: Reduce processing times and improve real-time responsiveness

Advanced Features

Animation Support: Add pre-built animations and pose libraries
Multi-Model Scenes: Allow multiple models in the same AR space
Export Capabilities: Save and share generated poses and models
Collaborative Features: Multiple users working with the same model simultaneously

Platform Expansion

Mobile AR Support: Extend to iOS/Android AR platforms
Web Integration: Create web-based tools for model management
Professional Tools: Develop specialized features for professional artists and animators

AI Enhancement

Improved Generation: Better text-to-3D understanding and model quality
Smart Rigging: More sophisticated bone structure generation for complex models
Pose Suggestions: AI-powered pose recommendations based on artistic intent

The foundation we've built opens up endless possibilities for revolutionizing how artists create and use reference materials. We're excited to continue developing this technology and see how it can empower creative professionals worldwide.

Built with passion and determination at Hack the North 2025
Transforming artistic vision into interactive reality