Voice Keeper | Devpost

Inspiration

Sci-fi movies, MIT AlterEgo, speech restoration devices.

What it does

It reads your lips and converts that into text. It then takes than text and synthesises your voice using ElevenLabs for voice cloning. The goal is for voice restoration for people who have had laryngectomies, and other speech impairments which renders speech painful or impossible.

How we built it

AutoAVSR visual speech recognition model into Claude for domain specific prompt engineering (viseme to phoneme based correction, based on linguistics and conversational context), and combining conversation context to intelligently correct the users output across multiple stages (first phonetically, then grammatically and contextually). Finally using ElevenLabs for real-time voice synthesis.

Challenges we ran into

Non-native english speakers mouth movements can be harder for the model to interpret due to the original training dataset being comprised of native speakers.

Accomplishments that we're proud of

Fully working real-time deployed live website with the pipeline working end to end with strong real world performance, within a day.

Novel method to achieve real-time performance by removing the default language model for the deep learning model and using Claude in place. Claude's very strong reasoning capabilities compensate for the language model loss and achieves lower word-error-rate (higher accuracy) while being faster.

What we learned

Lip reading still has its challenges and isn't perfect in the real world. However, in limited domains under certain conditions it could prove to be more useful.

What's next for Voice Keeper

Public release of the demo!

Built With

elevenlabs
nextjs
python
pytorch

Updates

MiscellaneousStuff Makepeace started this project — Nov 10, 2024 07:47 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.