Inspiration
The Rabbit R1 was a good idea, but it's hobbled by being closed source
What it does
It allows GPT-4o to be embodied and use tools
How we built it
We used the following tech:
- Silero VAD running in browser via Onnx and WebGPU
- Whisper-Base via WebGPU
- Deepgram Aura for TTS
- Mailgun for emails
- Spotify for song integration
- Suno.ai for song creation
- Next.js/React
- Web Streams for frontend-backend integration
- Clerk.dev for user authentication
Challenges we ran into
- Latency was a challenge, we turned to WebGPU to drop latency to first response as low as possible
- Midway we realized it was still too slow and had to rewrite our networking stack to use web streaming
- Our Raspberry PI didn't support WebGPU, so we couldn't make the hardware component of our project :(
- Working with Oauth and Authentication on a short scale is really stressful!
Accomplishments that we're proud of
- We're beating the Rabbit R1 on latency since we allow the AI to respond in parts, rather than at once
- We built out an agent that learns novel capabilities in context by self-assembling tools
- Doing transcription and VAD on device helps privacy tremendously: we're not sending raw audio to third parties, and we're less likely to pick up on and record non-consenting parties than most AI hardware projects these days
- We got a real domain for our project!
What we learned
- Latency is a huge factor in how it feels to use an agent
- Raspberry PIs have a long way to come on Vulkan/WebGPU
What's next for Open Rabbit
- Clean up the UI
- Get this project running on a raspberry PI
- Share it with the world and potentially start a Kickstarter!
Log in or sign up for Devpost to join the conversation.