Stëmm | Devpost

Inspiration

At the opening ceremony, we instantly recognised the vast potential the Deepgram API gives developers by allowing them to use speech recognition in innovative ways within their projects. Being newly introduced to this technology, we decided to integrate it into the most used web browser in the world: Google Chrome. Our aim is to build a platform that provides people with motor disabilities with an alternative to the average web browsing experience that is easy to use while still efficient. We leveraged both the Deepgram API and Chrome API into a chrome extension called Stëmm (Luxembourgish for 'voice'). Just install this extension and give it microphone permissions and you can control Chrome hands-free.

What it does

The extension gives you the ability to control Chrome by only using you voice. Once activated, you can open your favourite websites, add bookmarks and search Google using just some easy commands (like "Chrome, open Netflix") and a microphone. While in these 24 hours we only incorporated the most useful and intuitive commands, the implementation can be easily extended to provide full functionality.

How we built it

The heart of speech recognition is the Deepgram API. When the chrome extension window is opened, we prompt the user to give access to their microphone. For the rest of the session, the microphone listens in the background and the generated transcript is sent to the back-end. The extension will then use a language processing algorithm to identify the commands in the recorded text, and by integrating these commands with the Chrome developer tools, it executes them in order to control the browser.

Challenges we ran into

Chrome's over-corrective content security system
Chrome's nebulous error messages
Interpreting the transcribed voice to pick out only the useful voice commands
Navigating the often unclear permissions issues to allow the extensions to do everything

Accomplishments that we're proud of

Successfully using the Deepgram API in order to incorporate speech recognition in a software project for the first time.
Making our project compatible with Chrome (gaining access to all the useful permissions and having control over the browser)
Working efficiently as a team to create a project that requires skills from different domains of Computer Science

What we learned

We improved our understanding of Chrome's API and learned to manipulate the browser and use more advanced features such as livestreaming microphone input. We also learned a lot about Git and how to fix merge conflicts. Most importantly, we learned about the difficulties of working with human speech and making efficient programs under those circumstances.

What's next for Stëmm

The extension is still somewhat limited to features built into the browser. We are looking forward to extending it to control websites directly through their corresponding specific features (e.g. pausing and playing youtube videos, or Netflix) and making it into a fully functional alternative to the normal browser.

Built With

Submitted to

Hack Cambridge Atlas

Created by

I worked with troubleshooting the permission issues and integrating the Deepgram API with the browser

Siddharth Srivastava
I developed and integrated the algorithm that processes the text received from the DeepGram API as a transcript.

Benedek Der
I worked on the front-end and on the interface between the Chrome developer tools and our extension.

Bianca Sandu
I worked on interfacing between the extension and Chrome browser controls and on debugging permission problems caused by the browser's restrictive rules.

Julius Weisser