Inspiration
We were inspired to make the American Sign Language (ASL) truly accessible while raising awareness about ASL as a vibrant language used by millions. Drawing inspiration from engaging platforms like Wordle and MonkeyType, we asked: what if learning ASL could be just as addictive and rewarding? We were fascinated by the potential of computer vision and machine learning to create intuitive human-computer interactions that bridge physical gestures and the digital world. This project emerged from our belief that the best learning experiences don't feel like learning at all, transforming ASL education into an enjoyable experience that fits seamlessly into your daily routine.
What it does
Signify uses computer vision to detect hand poses in real time through your camera. It captures hand landmarks using MediaPipe and feeds this vector data into an LSTM model trained on a custom dataset to classify ASL signs from the ASL lexicon. The classified signs can then be used for interactive games like Wordle-style quizzes and speed typing challenges, making ASL learning engaging and accessible.
How we built it
We built Signify with a React Native frontend that streams live camera frames to a Python backend via WebSocket. The backend uses an asynchronous Producer-Consumer architecture for real-time processing. MediaPipe Holistic extracts 21 hand landmarks per hand (126 values per frame), which are buffered into sequences of 32 frames and fed into a custom-trained LSTM model for ASL classification. We implemented temporal smoothing using a sliding window average and intelligent response throttling to balance network efficiency with real-time responsiveness. The data pipeline processes videos from the MSASL and Kaggle Alphabet dataset through automated download, trimming, and landmark extraction stages. We integrated Firebase for user data storage and OpenAI's GPT API for dynamic word generation in game modes. The system supports instant state resets when users advance to new letters, ensuring the pipeline always processes the most relevant data.
Challenges we ran into
The biggest challenge was managing the sheer complexity of the system and coordinating its many moving parts. Building an asynchronous Producer-Consumer pipeline required careful orchestration of concurrent tasks, ensuring the inference pipeline could be instantly cancelled and reset when users advanced to new letters. Implementing intelligent feedback throttling was particularly tricky: we needed to balance real-time responsiveness with network efficiency by only sending updates when confidence scores changed significantly or enough time had elapsed. Managing async I/O between the WebSocket server, MediaPipe processing, and LSTM inference required deep understanding of Python's event loop to avoid blocking operations. We also debated extensively between LSTM and MLP architectures, weighing accuracy against inference speed for real-time performance. Dataset selection and ensuring compatibility between model outputs and frontend expectations added additional layers of complexity to an already intricate system.
Accomplishments that we're proud of
We built a complex real-time machine learning system entirely from scratch. Every component, from the custom LSTM model to the asynchronous pipeline architecture, was developed in-house, resulting in a highly modular codebase. Achieving true real-time performance was a major win: camera frames are processed and predictions update instantly between frontend and backend with minimal latency. We're particularly proud of our compute optimizations that keep the model lightweight and fast without sacrificing accuracy, backed by strong validation scores from training.
Beyond the technical achievements, we're excited about the concept itself: making ASL genuinely accessible through engaging gameplay. The design is simple, polished, and direct, letting users jump right in without friction. We added a competitive leaderboard system that makes learning more fun and motivating. Our learning tracing feature identifies struggle letters, helping users focus on signs they find difficult and learn from their mistakes. We also created comprehensive image guides for every ASL sign, ensuring users always have visual references to support their learning journey.
What we learned
We gained deep experience in ML deployment, particularly working with LSTMs and time series data represented as vectors. Understanding how to buffer sequential landmark data and feed it into recurrent networks for real-time classification was a significant learning curve. We learned to work with Google Cloud APIs for scalable infrastructure. Building WebSockets for real-time data transfer taught us about low-latency optimizations and the tradeoffs between responsiveness and network efficiency. We designed an intelligent feedback throttling system in-house that optimizes the amount of data sent between frontend and backend, only transmitting updates when confidence scores change significantly or sufficient time has elapsed. We also learned to navigate large, complex codebases and trace the flow of data through multiple moving parts in real-time systems. Debugging asynchronous pipelines taught us how to think about state management, task cancellation, and coordination across concurrent processes. This project pushed us to understand not just how to build machine learning models, but how to integrate them into production systems that respond instantly to user input.
What's next for Signify
We plan to significantly expand Signify's capabilities by incorporating larger and more diverse datasets to improve recognition accuracy and vocabulary coverage. We'll leverage generative AI to dynamically create varied difficulty levels and personalized learning experiences. Beyond hand gestures, we're excited to integrate MediaPipe's full pose detection algorithm to capture entire body language. This will unlock access to much larger sign language databases that include body movements, facial expressions, and spatial positioning, enabling us to recognize sentiment and more nuanced communication. By understanding complete body language, Signify can evolve from teaching individual signs to facilitating rich, expressive conversations that capture the full depth of ASL communication.
Built With
- expo.io
- fastapi
- javascript
- mediapipe
- msasl
- onnx
- python
- pytorch
- react
- react-native
Log in or sign up for Devpost to join the conversation.