SignHero 🤟
Inspiration
Over 48 million Americans are deaf or hard of hearing, yet ASL literacy remains low among the hearing population. We wanted to make learning ASL fingerspelling fun, accessible, and engaging—the same way Guitar Hero made learning rhythm patterns addictive.
The breakthrough idea: What if we could gamify sign language learning with real-time AI detection and rhythm-based gameplay?
What It Does
SignHero is a rhythm game that teaches ASL fingerspelling through webcam-powered gameplay:
| Mode | Description |
|---|---|
| 🎸 Song Game | Sign along to beatmaps synced with music—notes scroll down a Guitar Hero-style highway |
| 📚 Training Mode | Step-by-step practice with visual hand pose hints |
| ⏱️ Testing Mode | Timed challenges to measure proficiency |
| 🕹️ Whack-A-Sign | Arcade-style reflex game for quick recognition |
The AI watches your webcam, detects which ASL letter you're signing in real-time (~30-50ms latency), and scores your accuracy with combo multipliers, streak celebrations, and visual effects.
How We Built It
🧠 Machine Learning Pipeline!
For this we build a custom model based on the mobileneyb2 cnn as our backbone. The model has 2 main parts, the feature extraction and the classification! For extraction we have 32 channels, and use 5 specialized attention layers! We also have a custom classifier with a standard RELU activation function and filter that finalizes converting the single frame into 26 different classes!
We used various modified training data in order to be able to have a consistent model with different angles to make this compatible with various people to remove bais and more! There is alot more documentation on architecture training preprocessing and more in the folder here: https://github.com/MsMarion/ASL-Fun-Training/tree/main/Base%20test/Sign-Language-Recognition/documentation
Webcam Frame → MediaPipe (hand landmarks) → Feature Extraction → MobileNetV2 CNN → Letter Prediction
- MediaPipe Hands detects 21 hand landmarks from webcam frames
- Landmarks are drawn as a feature mask on a black background
- MobileNetV2 (trained on ASL alphabet data) classifies the pose into A-Z
- Both original and mirrored images processed; max confidence used
🎮 Full-Stack Architecture
| Layer | Technology |
|---|---|
| Frontend | Next.js 15, React 19, TypeScript, Tailwind CSS 4 |
| Animation | Framer Motion (smooth transitions, particle effects) |
| API | tRPC + React Query for type-safe data fetching |
| Database | MongoDB via Prisma ORM |
| ML Server | FastAPI (Python) serving PyTorch model |
| Hand Tracking | MediaPipe for real-time landmark detection |
🎨 Visual Design
We built a synthwave aesthetic with:
- Neon grids, palm trees, animated sun
- Screen flash & shake on hits/misses
- Guitar Hero-style streak glow effects
- Particle bursts and floating score text
Challenges We Faced
⚡ Latency Optimization
Real-time gameplay requires sub-100ms detection. We achieved ~30-50ms by:
- Using binary WebSocket protocol for frame transmission
- Running MediaPipe + MobileNetV2 on separate threads
- Processing both original and mirrored hand poses for better accuracy
🎯 Sign Recognition Accuracy
Hand signs are subtle—slight angle changes affect predictions. We improved accuracy by:
- Training on hand landmark feature masks rather than raw images
- Using mirrored predictions (max of both) to handle left/right hands
- Implementing confidence thresholds to filter noise
🎵 Timing Synchronization
Syncing game timing with AI predictions was tricky:
Note timing window: [noteTime - 2.0s, noteTime + 0.8s]
Perfect window: [noteTime - 0.3s, noteTime + 0.3s]
We buffer predictions and match them against note windows in real-time.
What We Learned
- MediaPipe is incredibly fast for hand tracking (~5-10ms)
- MobileNetV2 provides a great balance of accuracy vs. speed for real-time inference
- Building rhythm games requires careful attention to input latency
- Framer Motion makes complex animations surprisingly approachable
What's Next
- 🌐 Community beatmap creation and sharing
- 📱 Mobile app with on-device ML (Core ML / TensorFlow Lite)
- 🏆 Multiplayer competitive modes
- 📊 Learning analytics and progress tracking
- 🤲 Support for ASL words and phrases beyond fingerspelling (dataset generated and training started)
Built With
next.js react typescript tailwindcss framer-motion trpc prisma mongodb pytorch mediapipe fastapi python aws-s3



Log in or sign up for Devpost to join the conversation.