The AR Pose Reference Generator

A Hack the North 2025 Project Story


Inspiration

As digital artists, we constantly struggle with finding the perfect reference poses for our drawings. Traditional reference photos are static, limited in variety, and often don't match the exact perspective or pose we envision in our minds. We wanted to break free from these constraints and create something that could bring any pose idea directly into our field of vision.

The inspiration came from the realization that artists need dynamic, customizable references that can adapt to their creative vision in real-time. Instead of searching through hundreds of static reference images, what if we could simply speak our vision into existence and interact with it directly in augmented reality?

This led us to envision a system where artists could:

  • Generate 3D models from voice descriptions
  • Automatically rig them for pose manipulation
  • Control them with intuitive hand gestures in AR
  • Create the perfect reference pose for any drawing project

What it does

The AR Pose Reference Generator transforms artistic vision into interactive 3D reality. Users can speak any description—from "a human in a dynamic fighting pose" to "a spider crawling on a wall"—and watch as their words materialize into a fully rigged, poseable 3D model in augmented reality.

The system enables users to:

  • Voice-to-3D Generation: Convert any text prompt into a 3D mesh using AI
  • Automatic Rigging: Transform static models into poseable characters with bone structures
  • AR Interaction: Manipulate models using hand gestures and spatial controls
  • Real-time Posing: Create dynamic reference poses for drawing and artistic work

Essentially, it brings anything from your mind into the real world as an interactive, poseable 3D reference that you can position exactly as needed for your artistic vision.

How we built it

We constructed a sophisticated 4-component pipeline that seamlessly integrates multiple cutting-edge technologies:

Component 1: Mesh Generation (MeshGeneration/)

  • Lens Studio (v5.10.0+) for AR development
  • Snap Spectacles (2024 model) for hardware platform
  • Snap3D API via Remote Service Gateway for text-to-3D generation
  • TypeScript scripting for real-time processing
  • Gemini Live and OpenAI Realtime for voice interaction

Component 2: Backend Storage (Backend/)

  • Node.js with Express.js for API server
  • AWS S3 for scalable file storage
  • AWS DynamoDB for metadata tracking
  • UUID system for unique asset identification

Component 3: 3D Model Rigging (Rigging/)

  • Python 3.8+ with trimesh library for 3D processing
  • Meshy AI API for automatic bone structure generation
  • Node.js wrapper for seamless integration
  • FBX/GLB format conversion and optimization

Component 4: AR Rendering (as/)

  • Spectacles Interaction Kit for hand tracking
  • FBX/GLB 3D model rendering
  • Joint-based interaction system for pose control

Integration Flow

Voice Input → Snap3D Generation → AWS Storage → Meshy AI Rigging → AR Import → Hand Control

The entire pipeline operates asynchronously, allowing users to generate multiple models while previous ones are being processed and rigged.

Challenges we ran into

Integration Complexity

The biggest challenge was integrating all the different segments of the code from various development environments and API versions. Each component used different technologies (TypeScript in Lens Studio, Node.js for backend, Python for rigging) with their own version requirements and compatibility constraints.

Documentation Gaps

Snap Spectacles documentation was sparse, especially for newer features like Remote Service Gateway and the 2024 hardware model. We spent significant time reverse-engineering API behaviors and debugging integration issues with limited official guidance.

API Synchronization

Coordinating between multiple external APIs (Snap3D, Meshy AI, AWS services) required careful error handling and retry logic. Each service had different rate limits, response formats, and authentication mechanisms that needed to be harmonized.

Real-time Processing

Managing the asynchronous nature of 3D generation and rigging while maintaining a responsive AR experience proved challenging. Users needed immediate feedback while background processes handled heavy computational tasks.

Cross-Platform Compatibility

Ensuring the system worked consistently across different development environments (macOS, Windows, various Python/Node versions) required extensive testing and environment-specific configurations.

Accomplishments that we're proud of

We successfully developed the entire pipeline, albeit not fully connected, and built out all core functionalities. This represents a significant achievement because:

  • Complete End-to-End System: Every component from voice input to AR interaction is functional
  • AI Integration Mastery: Successfully integrated multiple AI services (Snap3D, Meshy AI, Gemini, OpenAI) into a cohesive workflow
  • Cloud Architecture: Built a robust backend system with AWS integration for scalable file processing
  • AR Innovation: Created an intuitive hand-gesture control system for 3D model manipulation
  • Modular Design: Each component can operate independently, making the system maintainable and extensible

The fact that we can generate 3D models from voice, automatically rig them, and control them in AR represents a breakthrough in creative technology that we're extremely proud to have achieved.

What we learned

Project Organization

We learned to modularize our coding and organize our time well. Breaking the project into distinct, independent components allowed team members to work in parallel and made debugging much more manageable.

API Integration Strategies

  • How to handle asynchronous processing across multiple services
  • The importance of proper error handling and retry mechanisms
  • Best practices for managing API rate limits and authentication

AR Development

  • The complexities of developing for cutting-edge hardware like Snap Spectacles
  • How to optimize 3D models for real-time AR rendering
  • The challenges of hand tracking and gesture recognition

Cloud Architecture

  • Designing scalable file storage and processing systems
  • Implementing secure upload/download mechanisms
  • Database design for tracking complex processing workflows

Team Collaboration

  • Effective communication across different technical domains
  • Time management for hackathon constraints
  • Balancing feature development with integration testing

What's next for The AR Pose Reference Generator

Immediate Improvements

  • Complete Pipeline Integration: Fully connect all components for seamless end-to-end operation
  • Enhanced UI/UX: Improve the AR interaction experience with better visual feedback
  • Performance Optimization: Reduce processing times and improve real-time responsiveness

Advanced Features

  • Animation Support: Add pre-built animations and pose libraries
  • Multi-Model Scenes: Allow multiple models in the same AR space
  • Export Capabilities: Save and share generated poses and models
  • Collaborative Features: Multiple users working with the same model simultaneously

Platform Expansion

  • Mobile AR Support: Extend to iOS/Android AR platforms
  • Web Integration: Create web-based tools for model management
  • Professional Tools: Develop specialized features for professional artists and animators

AI Enhancement

  • Improved Generation: Better text-to-3D understanding and model quality
  • Smart Rigging: More sophisticated bone structure generation for complex models
  • Pose Suggestions: AI-powered pose recommendations based on artistic intent

The foundation we've built opens up endless possibilities for revolutionizing how artists create and use reference materials. We're excited to continue developing this technology and see how it can empower creative professionals worldwide.


Built with passion and determination at Hack the North 2025
Transforming artistic vision into interactive reality

Share this project:

Updates