Inspiration
Game of Throne actress Emilia Clarke shared a vulnerable moment when she couldn't recall her own name:
Nonsense words tumbled out of my mouth and I went into a blind panic. I’d never experienced fear like that—a sense of doom closing in. I am an actor; I need to remember my lines. Now I couldn’t recall my name. (The New Yorker)
Figures like Clarke and Bruce Willis (Die Hard) have gone public about their struggle with aphasia, a language disorder that makes it very difficult to communicate and find the right words.
They aren't the only ones: Over 1 million people in the United States struggle with aphasia, and nearly 180,000 Americans acquire it each year (National Aphasia Association). The effects of aphasia on one's life is often devastating, from preventing one from conveying the simplest ideas everyday conversations to talking to loved ones. See a clip of a man with aphasia: [https://www.youtube.com/watch?v=JWC-cVQmEmY&ab_channel=tactustherapy]
Recognizing this overlooked challenge, we developed an app using cutting-edge speech-to-text technology to assist those with aphasia and others facing communication barriers. As children of immigrants, we are inspired by our elderly relatives who grapple with English. We aim to bridge these gaps, making communication accessible for everyone.
What it does
Our application, at its core, is a bridge for seamless communication. Upon typing or pressing the record button, users can begin communicating their fragmented sentences or phrases. The app then processes this input in real-time, leveraging generative AI and contextual cues (the conversation history) to continually suggest coherent, full English sentences in real-time, which users can select and play out loud. The generated audio mimics their voice and speaking style, making the conversation a seamless experience and empowering their voices to be heard.
How we built it
Our team's approach was strategic. Dividing tasks based on individual specialities ensured that every facet of the project was meticulously handled. Here's a brief overview:
Design
- Mockup via Figma: Focusing on a user-friendly interface, ensuring ease of use for the elderly and those unfamiliar with technology
Backend
- Flask + IBM Watson: Utilized the Speech-To-Text API for speech transcription, integrating it using Flask
Frontend
- ReactJS with Material UI
- ElevenLabs: Leveraged the Text-To-Speech API that allows users to play their selected sentence out loud
- OpenAI: Implemented the GPT Auto-Completions API to generate suggestions
Challenges we ran into
One of the major challenges we encountered was the integration of the recording button with the IBM Watson API. We initially spent several hours aiming to integrate React with our API calls in Node, however we were unable to do so. A last minute pivot to using Flask for backend allowed us to connect our front-end with the Watson API running in the backend.
Accomplishments that we're proud of
We are all college sophomores and relatively new to hackathons—two of us have never attended one before. Yet, we managed to ideate, mock up, and wireframe an app with a relatively intricate UI and seamless, behind-the-scenes features, implement it with 3 APIs and a backend, all within the span of 24 hours.
We’re proud because we believe we built an app that offers true value to a very large audience, and ourselves. We've created a platform that allows everyone, regardless of their linguistic abilities, to feel heard and understood, like our very own family members. Whether it’s a person with aphasia, a grandparent trying to speak with their grandchild, someone who suffered from a stroke, and so many more.
What we learned
Collaborative Synergy: We didn’t know each other before this hackathon, but we learned to quickly pinpoint each others’ strengths to contribte. It reinforced the idea that a team with diverse skills can achieve more than the sum of its parts.
Technical Adaptation: We learned to use Git to collaborate on the code synchronously. The need to pivot from NodeJS to Flask with ReactJS underlined the importance of adaptability in tech projects.
What's next for Tran
The obvious next step is to let people with speech disorders and language barriers get their hands on the app and provide feedback.
We also plan to make the app smoother, polish the UI, and create more context, such as implementing chat “memory” or allowing users to select the condition or nationality they are, which would further optimize the suggestions.
Built With
- css
- elevenlabs
- flask
- ibm-watson
- openai
- python
- reactsjs
Log in or sign up for Devpost to join the conversation.