Inspiration
Ever had an incredibly busy day and can't remember all the people you've met, or all the tasks you've got planned for the future? We're humans! Our memory is imperfect and that's difficult, but it doesn't need to be: so we built I-IMO!
I-IMO is a conversational AI agent designed with the busiest people in mind. I-IMO scans both visual input (webcam) and aural input (OMI devkit2) throughout your day to organize and summarize profiles for everyone you meet, while maintaining task lists and tracking important events so you never forget anything important from your meetings.
N.B: We originally wanted to do this app on any smart glasses, but unfortunately due to supply constraints we couldn't get one. In the future, the plan is to adapt our project to run off of in-built webcams in devices like the Omi Glass.
What is I-IMO?
I-IMO is a personal conversational intelligence assistant that is an always-on second-in-command:
- Captures conversations through real-time audio transcription
- Recognizes faces via webcam to identify meeting participants
- Builds personal profiles automatically from conversation context
- Extracts action items and creates task lists from discussions
- Maintains conversation history with semantic search capabilities
- Never lets you forget important details about the people you meet
- Generates context to infer insightful predictions/aids for upcoming events in your calendar.
Our Tech Stack
- React + Typescript + Vite,
- FastAPI + Custom YOLO11n model for face detection,
- Tailwind + Shad/CN UI,
- Convex Backend,
- Express.js Backend for NLP,
- Groq-hosted gpt-whisper for audio transcribing,
- Digital Ocean AI Gradient for GPU intense task,
- ChromaDB for vectorised queries and semantic search,
- OpenAI Text Embedding.
Architecture
Challenges We Ran Into
- connection issues 😡,
- streaming raw audio data from Omi (with low latency),
- high accuracy face classification that remains consistent (tagging id and tag name with it),
- deduplication of misspellings of names and mistranscriptions,
- nature of audio streaming: 5 second chunks clipping certain transcribed words,
- syncing vectorised ChromaDB with traditional Convex backend,
- minimising non-deterministic nature of profile summaries and task predictions,
Important Lessons
- Learnt how to integrate a lot of different technologies and choose/route between the best models for unique workloads,
- Designed ways to sync between multiple backends and frameworks, especially with fundamental infra differences between vectorised semantic-search db's and traditional backends,
- Balanced the benefits lightweight, fast inference models with the lower completion accuracy,
- Developed a lot of ad-hoc teamwork skills while working on the same project with time-pressure (minimising merge-conflicts, documenting pull-requests using tools like CodeRabbit, keeping commit tree clean...)
What's Next
- Deep integration with glasses (Meta Raybans, Omi Glass, Snapchat Spectacles),
- Potentially link 3rd party MCP providers,
- Connect with social media platforms for deeper context / automated actions based on the user's day,
- Time-sensitive summaries,
- Mobile, watch, and native versions of the dashboard.
Built With
- chromadb
- convex
- digtalocean
- express.js
- fastapi
- groq
- openai
- react
- shadcn
- tailwind
- typescript
- vite
- yolo




Log in or sign up for Devpost to join the conversation.