Inspiration
When the quarantine was introduced, many individuals, including us, were starved of social interactions. It was difficult transitioning from text-only interactions to fully-fledged conversations after the pandemic, let alone public speeches and presentations. Although enough time has passed to mend our abilities to speak to others and have fruitful conversations, speeches in front of large groups still stand as challenges that we'd rather not face. Knowing that the prevalence of social anxiety is increasing drastically in our society, we decided to tackle the issue that we and many others face with our app, Verbalyst.
What it does
Verbalyst allows users to rehearse and record their speeches with both audio and camera an unlimited amount of times. It analyzes the recording to determine how many stutters the user has stated in the duration of their speech. This project encourages more practice and familiarity with speaking, aiding those of all ages as speaking skills are universal; they're needed in the industry, in classrooms, in friend groups, you name it. Users can track their speaking skill progress through the STATS page and see their growth right in front of their eyes, and for individuals who are already confident in their speaking capabilities, Verbalyst offers a fun challenge in the form of a tongue twister generator. Individuals can record their tongue twisters and catch themselves saying filler words and other stutters, which can help eliminate poor habits.
How we built it
We used TailwindCSS, vanilla HTML and vanilla CSS to create all of the website pages. We used javascript for our recording software and used Flask to connect our front end to our back end. Our back end was created through Python and we used the vertexai API to have an AI-generated analysis of the speech that the user inputted, and we used the assemblyai API to have an AI do speech to text on the inputted mp4 file from the user.
Challenges we ran into
Starting 26 hours (8 pm EST) behind due to scheduling issues and personal conflicts posed a great challenge. Despite this major setback, we adapted swiftly, optimizing our time to make up for lost hours.
Although the end product of our code had been less than we anticipated, we were able to have both of our AI components working individually through our console. It was creating a cohesive web application with proper functionality within our severe time crunch that proved to be the problem. Despite this, we still tried our best to implement all of the components we envisioned yesterday, and we believe that we did the best we could with our time.
Accomplishments that we're proud of
Given our aforementioned challenges, we're proud that we have an aesthetically pleasing homepage with a functioning recording feature that allows the user to view their recording live and download their recording upon pressing the kebab button. The recording feature also lets the user switch tabs and still be able to see themselves with the Picture in picture button so that they can switch tabs to read their script if required. We're also proud we were able to get the Google Vertex AI part to recognize stutters in a speech and proud that we were able to get the speech-to-text portion from AssemblyAI to work alone.
What we learned
The entire project was essentially a learning journey for us; we learned so many things for it to have the functionality it has now. For starters, creating the recording feature and being able to receive and store user audio input so that it could be used for the AssemblyAI speech-to-text transcription model was completely new. Although we were unable to put the user audio input and the AssemblyAI speech-to-text model together with Flask, we still learned how to use AssemblyAI and convert premade audio input into text. Even the markup framework we utilized for this project was new (TailwindCSS), so it's safe to say we put a lot of time and effort into learning new things.
What's next for Verbalyst
We had quite an ambitious plan for our project, but we were unable to reach this goal because of our 26-hour delay. Because of this, we actually have a plan on what to do even after the hackathon ends. We will fix the flask that we were close to fixing to have our website work as intended. On top of this, there were limitations to the speech analysis that we could do. We were able to determine repetition or stutters through purely text-to-speech, but we were not able to find an AI that could identify lisps, speaking too fast/too slow, or other speech impediments. In the future, we will try to find an API or hopefully even create an AI that will be able to do this. On top of this, some portions of our UI/UX are inconsistent, so we plan to fix this.
Built With
- assembly-ai
- css
- flask
- google-vertex-ai
- html
- javascript
- python
- tailwind
Log in or sign up for Devpost to join the conversation.