ASL Transcription
Close-up view of the Raspberry Pi
Glasses

Voxio - derived from vox (Latin for voice) and visio (latin for vision). Imagine a world where everyone can communicate! Our hardware hack features glasses equipped with a camera and speaker, attached to the brains of a Raspberry Pi. Designed to enhance communication for the hard-of-hearing and culturally deaf communities, Voxio provides real-time American Sign Language (ASL) transcription. With its help, users can effortlessly recognize and interpret a range of ASL signs to make communication seamless and inclusive.

💡Inspiration

When coming up with this project idea, we knew it would be a challenge for all of us. At the same time, we recognized the potential impact of developing this idea and its impacts on communication and accessibility for the Deaf community. We aimed to bridge gaps and foster inclusivity, ultimately contributing to both local and international efforts to support and empower Deaf individuals.

🛠️How it works

Voxio’s premise is straightforward. First, the camera module detects a sign. This sign could be from the user or the speaker they are communicating with. Next, the Raspberry Pi sends the camera’s output to a Flask server where the model is hosted on Google Cloud. Using the power of machine learning, MediaPipe, and OpenCV, the model recognizes the sign and returns English text.

✌️Creating the model

We researched and tried many different pre-made models for ASL online. Despite this, none of them worked to our liking. Instead, we went down the path of training our very own model by recording us performing each sign and slightly moving our hands. Each frame in the resulting short video was then used to train the model. Although it was time-consuming and only using four distinct hands, it worked far more effectively - and we also had fun learning each sign.

🚧Challenges we faced

We faced quite a few challenges, but mainly with the hardware that we used.

Some of the 3D printed parts didn’t fit together properly
Wiring was getting messy, especially since soldering wasn’t allowed in the building
Difficulty interfacing with Raspberry Pi without a monitor for the first 6 hours

🏆Achievements we made

Despite these setbacks, we’re still proud of our accomplishments. We managed to set up a working hardware system, train our own AI model, and established communication between the Pi and the cloud server, all within the span of 24 hours (not including sleep and waiting for a monitor to arrive). We were also able to create a functional demo website capable of showcasing Voxio.

🚀What’s next for Voxio

In the future, we plan on improving Voxio by implementing speech to text transcription + an oled display to allow people to understand words even if they are hard of hearing. We later plan on improving the ergonomics, adding more refined 3D-printed designs, and improving the software with a model with more capable training data for the Vision Transformer.