Inspiration:
Remote and hybrid work has increased the frequency and complexity of meetings, the key discussions often get lost without proper documentation. Thus teams need the intelligent tools to capture, summarize and analyze meeting efficiently.
What it does:
Our Case explores building an AI powered assistant to make meetings more productive and actionable by creating a "chrome extension" and a "web application" with 5 actionable features: Summarization, Transcripts, Translation, Sentimental Analysis and action TO-DO list.
How we built it:
We built our project in two interconnected phases: a robust web application and a lightweight Chrome browser extension. This dual-phase approach enabled us to provide both real-time analysis and a seamless user experience across platforms.
The web application forms the backbone of our system. We used React to build an intuitive frontend interface and FastAPI for the backend services. Real-time communication between the frontend and backend is handled through WebSockets, allowing updates to be streamed every three seconds with only a 100ms delay. Audio data is processed in real time using several machine learning models: Whisper for multilingual and noise-tolerant transcription, T5/BART for context-aware summarization, GPT-3.5 combined with rule-based NLP for extracting action items, and VADER or BERT for sentiment analysis. We also incorporated pyannote-audio to enable speaker diarization, allowing us to attribute spoken content to individual participants. All processed data is stored in a MongoDB database, and users can download comprehensive reports with summaries, insights, and action items.
In the second phase, we developed a Chrome extension using React and Electron. The extension directly overlays on top of web-based meetings such as Zoom, Google Meet, or YouTube, providing real-time insights without the need for a full desktop app. This approach offers lightweight performance, cross-platform compatibility, and seamless integration into existing workflows. By tapping into the same backend via WebSocket, the extension receives continuous data streams, including transcriptions, sentiment insights, and task recommendations. It enables smarter, context-aware overlays directly in the user’s browser, making it highly accessible and efficient.
Challenges we ran into:
Difficulty accessing live audio from browser tabs, Analyzing the task extraction from summary or transcript, Delays in AI models (eg: transcription, summary)
Accomplishments that we're proud of:
We worked hard and developed for the first time "a CHROME BROWSER EXTENSION" and also a "web application" using with AI, LLM model, sentimental analysis models, web-sockets.
What we learned:
We learnt for the first time on how to use web sockets and Chrome web browser extensions along with AI implementations.
What's next for TAGE:
Implementing fine-tuning embedded RAG based LLM for domain based meeting insights, making the application a multimodal AI agents which not just give insights based on audio, but it can worn other input formats like videos, images, live streams using vision language models, also there is a huge scope for building a monitoring application health and AI agents interactions via usage of cloud/AI.
Built With
- bart
- bert
- datavisualization
- electron
- fullstackwebdevelopment
- gcp
- java
- javascript
- llms
- mongodb
- multi-linguallanguagetranslation
- natural-language-processing
- node.js
- openaiwhisper
- pyannote-audio
- python
- react
- react-native
- sentimentalanalysis
- vadersentiment
- websockets
Log in or sign up for Devpost to join the conversation.