Inspiration
At the opening ceremony, we instantly recognised the vast potential the Deepgram API gives developers by allowing them to use speech recognition in innovative ways within their projects. Being newly introduced to this technology, we decided to integrate it into the most used web browser in the world: Google Chrome. Our aim is to build a platform that provides people with motor disabilities with an alternative to the average web browsing experience that is easy to use while still efficient. We leveraged both the Deepgram API and Chrome API into a chrome extension called Stëmm (Luxembourgish for 'voice'). Just install this extension and give it microphone permissions and you can control Chrome hands-free.
What it does
The extension gives you the ability to control Chrome by only using you voice. Once activated, you can open your favourite websites, add bookmarks and search Google using just some easy commands (like "Chrome, open Netflix") and a microphone. While in these 24 hours we only incorporated the most useful and intuitive commands, the implementation can be easily extended to provide full functionality.
How we built it
The heart of speech recognition is the Deepgram API. When the chrome extension window is opened, we prompt the user to give access to their microphone. For the rest of the session, the microphone listens in the background and the generated transcript is sent to the back-end. The extension will then use a language processing algorithm to identify the commands in the recorded text, and by integrating these commands with the Chrome developer tools, it executes them in order to control the browser.
Challenges we ran into
- Chrome's over-corrective content security system
- Chrome's nebulous error messages
- Interpreting the transcribed voice to pick out only the useful voice commands
- Navigating the often unclear permissions issues to allow the extensions to do everything
Accomplishments that we're proud of
- Successfully using the Deepgram API in order to incorporate speech recognition in a software project for the first time.
- Making our project compatible with Chrome (gaining access to all the useful permissions and having control over the browser)
- Working efficiently as a team to create a project that requires skills from different domains of Computer Science
What we learned
We improved our understanding of Chrome's API and learned to manipulate the browser and use more advanced features such as livestreaming microphone input. We also learned a lot about Git and how to fix merge conflicts. Most importantly, we learned about the difficulties of working with human speech and making efficient programs under those circumstances.
What's next for Stëmm
The extension is still somewhat limited to features built into the browser. We are looking forward to extending it to control websites directly through their corresponding specific features (e.g. pausing and playing youtube videos, or Netflix) and making it into a fully functional alternative to the normal browser.
Log in or sign up for Devpost to join the conversation.