Inspiration
Um, we like, wanted to uhhh stop talking like this all the time.
For some, poor speaking patterns are an annoying habit worth fixing. For others, such as those with speech impediments, those whose native language is not English, or children just beginning to advance their vocabulary, bad speech is a constant hindrance in daily communication. SpeechTree exists to tackle the root cause of faulty speech.
What it does
Live Coaching: SpeechTree provides real time coaching through immediate haptic feedback on a phone every time a filler word is used. That instant feedback slowly rewires the habit.
Analytics: After a listening session, SpeechTree provides a full breakdown of users' speaking patterns and provides thorough help moving forward.
History: After each analysis, we store aggregate metrics (quality scores, filler rate, WPM estimate, etc) and display progress over time, making improvement clear when it happens.
How we built it
We primarily leveraged React Native for its cross-platform capabilities, allowing for a single codebase to deploy on both iOS and Android devices via ExpoGo and ExpoDev. This enabled rapid development and iteration, streamlining the process of bringing the app's core features to life. For the more intricate, real-time speech processing and buzzing functionality, the team integrated Swift for native iOS performance, utilizing audio processing to detect filler words. Furthermore, the project incorporated K2 Think V2 to provide live speech analytics and coaching, vocabulary expansion recommendations, and intelligent tracking of user progress, enabled thanks to speech-to-text processed via AssemblyAI. We also implemented a Firebase database for the backend, to store user data and help implement progress tracking features.
Challenges we ran into
The biggest hurdle was to build the architecture with Swift given React Native ExpoGo's limitations. Initially, this hindered our ability to stream audio live and process speech-to-text as we had envisioned, but getting over that was a massive indicator of success for us. We also tinkered heavily with the K2 Think V2 LLM model to ensure that speech analyses brought accurate, consistent evaluations with precise feedback that would genuinely elevate users' speech. Initially, the model brought inconsistent recommendations and even hallucinated in its evaluations, but with better context orientation on the user profile, by feeding more data on the user's goals, age, and language proficiency, we received better results from the model.
Accomplishments that we're proud of
We are most excited by having set up the phone vibrations feature, where during Live Coaching, the phone buzzes whenever filler words (such as Um, Uh, or Like) are used. We find various important use cases for this feature, particularly for users to practice public speaking with fluency and confidence. The alerts notify users when they tend to rely on fillers, and helps them correct this reliance on filler words.
What we learned
We learned more about iOS development and the use of React Native, as well as tools such sa Swift and Assembly AI, which all played vital roles in our project.
What's next for SpeechTree
Our vision is to transform the initializing period with SpeechTree to be automated, where instead of the user self-reporting their level of proficiency, the app tracks the user for a set amount of time (a day, for exmaple), learns and recognizes the users speech patterns and voice, and uses this context to provide personalized feedback beyond the training phase. We also plan to extend SpeechTree to other languages beyond English, and extend the platform as a unified literacy program across languages. Lastly, we aim to further the evaluation executed by the K2 Think V2 AI model to detect and analyze speech styles, in pauses, volume control, and flow, to help users improve their speaking in both formal and informal settings.
Built With
- assemblyai
- expo.io
- firebase
- javascript
- k2-think-v2
- react-native
- swift
- typescript
Log in or sign up for Devpost to join the conversation.