Here's a comprehensive overview of SpellWithASL!
Inspiration
We were inspired by the need to make American Sign Language (ASL) more accessible to learners everywhere. Traditional ASL learning often requires in-person instruction or static resources, creating barriers for many people who want to learn this beautiful language. We wanted to create an interactive, AI-powered platform that could provide real-time feedback and make ASL spelling practice engaging and accessible from anywhere.
What it does
SpellWithASL is an interactive web application that teaches users to spell words using American Sign Language through real-time AI gesture recognition. Users can:
- Practice ASL spelling with real-time hand gesture recognition using their webcam
- Learn with instant feedback as the AI recognizes each letter and provides confidence scores
- Progress automatically through a curated vocabulary of 30+ practice words
- Track their learning with comprehensive statistics on letters and words completed
- Collect training data to continuously improve the AI model's accuracy
- Enjoy a seamless experience with celebration animations and smooth word transitions
The system recognizes hand landmarks in real-time and uses a trained neural network to identify ASL letters, providing immediate feedback to help users improve their signing accuracy.
How we built it
Frontend Architecture:
- Built with Next.js and React using TypeScript for type safety
- Integrated MediaPipe Hands for real-time hand landmark detection
- Implemented custom camera handling with retry logic and error recovery
- Designed a clean, minimal UI with consistent color palette and responsive design
Backend & AI:
- Developed a FastAPI service for real-time ASL letter prediction
- Created a TensorFlow/Keras neural network trained on hand landmark coordinates
- Implemented landmarks-only architecture (no images) for privacy and performance
- Used scikit-learn for data preprocessing and model validation
- Applied class balancing and data augmentation to handle training data imbalance
Key Technical Decisions:
- Landmarks-only approach: Chose hand coordinate data over images for better privacy, performance, and real-time processing
- Microservices architecture: Separated AI inference, backend API, and frontend for scalability
- Robust state management: Implemented comprehensive timing controls and validation to prevent auto-completion bugs
- Progressive enhancement: Added automatic word progression with celebration feedback
Challenges we ran into
1. Architecture Evolution: Started with an image-based approach but realized landmarks-only was superior for performance, privacy, and real-time processing. Required major refactoring.
2. Camera Stability: Encountered black screen issues and camera initialization failures. Solved with comprehensive retry logic, timeout handling, and manual restart functionality.
3. State Management Complexity: Faced auto-completion bugs where completing one letter would accidentally trigger the next letter. Fixed with robust validation, state cleanup, and timing controls.
4. Model Training Challenges: Dealt with class imbalance in training data and needed to implement proper hand pose normalization and data augmentation techniques.
5. Real-time Performance: Balanced prediction accuracy with real-time responsiveness, implementing throttling and confidence thresholds.
Accomplishments that we're proud of
🎯 Technical Achievements:
- Built a fully functional real-time ASL recognition system that works in web browsers
- Achieved seamless user experience with automatic word progression and celebration animations
- Implemented privacy-first architecture using landmarks instead of images
- Created robust error handling with automatic camera recovery and manual restart options
🎨 User Experience:
- Designed a beautiful, minimal interface with consistent design language
- Built engaging learning flow with progress tracking and visual feedback
- Implemented accessibility features and responsive design
🤖 AI/ML Success:
- Trained a working neural network with good accuracy across 26 letters
- Implemented proper data preprocessing with hand pose normalization
- Created comprehensive training pipeline with model evaluation and testing
What we learned
Technical Insights:
- Computer vision in the browser is powerful but requires careful optimization for real-time performance
- State management in real-time applications needs robust validation and cleanup to prevent race conditions
- Landmarks-based ML can be more effective than image-based approaches for gesture recognition
- User experience design is crucial for educational applications - smooth interactions keep users engaged
AI/ML Learnings:
- Data quality over quantity: Proper normalization and preprocessing matter more than just having lots of data
- Class balancing is essential when training on human-generated gesture data
- Real-time inference requires different considerations than batch processing
Product Development:
- Progressive enhancement: Starting simple and adding features incrementally leads to more stable systems
- User feedback loops: Real-time validation and celebration animations significantly improve learning engagement
What's next for SpellWithASL
🚀 Immediate Improvements:
- Expand vocabulary to full ASL dictionary beyond just spelling
- Add word difficulty levels and adaptive learning paths
- Advanced gesture recognition for full ASL phrases and sentences
- Mobile apps for iOS and Android
SpellWithASL represents our vision of making sign language education accessible, engaging, and effective through the power of AI and modern web technologies.
Built With
- fastapi
- keras
- mediapipe
- next.js
- node.js
- python
- react.js
- scikit-learn
- tailwindcss
- tensorflow
- typescript
Log in or sign up for Devpost to join the conversation.