Inspiration
- The project was inspired by the tedious task of constantly writing notes during meetings and the need to ensure that no important points are missed.
What it does
- The project listens to the room, recognizes voices and what is being said.
- It sends the recognized text to our API for processing.
- The API uses chat GPT to generate summarized notes.
- The API also receives the current document from Google Docs via Chrome Google Docs APIs.
- The summarized notes are then combined back into the document.
How we built it
- We used React for the Chrome extension.
- We created a Flask backend that communicates with OpenAI and performs text processing to identify the parts that need to be updated.
- On the frontend, we used Google APIs to view what already exists in the document and to update the document with any new information that is spoken.
Challenges we ran into
- Initially, we attempted to use Google's speech recognition API to get transcripts and differentiate voices. This required setting up a websocket server for the frontend to stream audio and call the API. Although we managed to establish the initial websocket connection, we struggled with correctly calling the API from the backend, which resulted in poor results.
Accomplishments that we're proud of
- Successfully developed a working extension that provides a basic example of how speech recognition can be used in meetings.
- Implemented a good number of features including the ability to undo changes that the voice recognition picked up on.
- Created a user-friendly view of all the changes to ensure the document remains clean and organized.
What we learned
- For half of the team, it was their first hackathon, so they learned a lot about creating applications in a short time span.
- Even for those who have participated in hackathons before, they learned a lot about the abilities and limitations of Chrome extensions, how to work with audio, working with NLP tasks, and integrating it all together.
What's next for Minute Masters
- Get the Google speech recognition working to differentiate voices and provide better, speaker-specific notes.
- Improve the system's speed to reduce the delay between speaking and note updating, making the process more live.
Log in or sign up for Devpost to join the conversation.