Inspiration
Selali has been working in the HCI space for nearly a decade and Avi has been building multi-sensory multi-modal GenAI experiences the past couple years.
The recent trend of AI wearables is hinting to a brand new paradigm of HCI (human computer interfaces), agent computer interfaces. The r1 was a $100 M dollar attempt at redefining our interaction with compute as we know it, but it didn’t deliver on its promises.
Why wait for the future, when you can build it yourself?
What it does
m1 : a cross-platform PWA that allows users to create with the latest GenAI tools with a voice interface
- Core Functionality:
- ultra low-latency multimodal conversational agent
- Generate songs on the fly with Suno and provides link to webapp DAW
- Generate research reports with Exa
- Operate your computer from anywhere in the world with just your voice using OpenInterpreter
- controlling the Phillips Hue with Milo
How we built it
Typescript/Next.JS DeepGram/Cartesia (Vapi?) Whisper/Nova 2 Suno.AI Exa.AI OpenAI Vercel OpenInterpreter
Challenges we ran into
Initially started with concept of “chat with website” but didn’t seem exciting enough and pivoted toward working with IoT hardware and didn’t have some cables so pivoted again to multi-device, multi-agent interactions that maximize user experience.
We also ran into some issues trying to get crewAI working with our typescript framework. Most of the LLMs don’t sound very natural or conversational without extensive prompt engineering
Accomplishments that we're proud of
Reducing technical debt for creators and having working demo.
Use Cases:
Agents and its use-cases are a multi-trillion dollar industry which will redefine work and culture as we know it. One of the answers to the search for meaning in this time is to create and build. GenAI and agentic systems reduce the technical debt to do so. With the recent trend of AI wearables, we wanted to see how we can improve functionality and reduce costs, what better way than a server-less cross-platform PWA?
What's next for Milo—M1
Feature Roadmap:
- OpenInterpreter
- MultiOn for browser interactions
- Ability to choose between Suno + Udio for generations
- 2D to 3D Image Pipeline (HF API/Meshy)
- Rewind RAG
- OctoML (text based image editing)
- LTX Studio / Runway / CrewAI Vid
- CoT Researcher
- Stock Analyst
Built With
- deepgram
- exa
- nextjs
- open-interpreter
- openai
- philips-hue
- phillips
- suno
- vercel
- whisper
Log in or sign up for Devpost to join the conversation.