Inspiration

We've all had that moment—watching a podcast, a lecture, a fireside chat—wishing we could just pause and ask a question. Challenge their logic. Say "wait, tell me more about that." GrokCast was born from that universal fantasy: what if you could actually talk to the people in the videos you watch?


What it does

GrokCast transforms passive video consumption into active conversation. Pick any video—Joe Rogan, Lex Fridman, Einstein, your favorite creator—and interrupt it. Hold spacebar, speak your question, and Grok responds as them, not about them. The AI reconstructs their reasoning patterns, tone, and worldview in real-time. When the response ends, the video seamlessly resumes. It's like having a private conversation with anyone who's ever been on camera.


How we built it

  • Grok Voice & Imagine (xAI) powers the conversational intelligence and persona reconstruction
  • Next.js 16 + React 19 for the frontend with a VisionOS-inspired UI
  • AssemblyAI for real-time voice transcription (STT)
  • Custom state machine orchestrates seamless transitions between podcast playback, AI listening, and AI responding
  • Dual-video player with cross-fading for smooth, natural visual transitions

Challenges we ran into

The biggest hurdle was latency. Generating AI video responses in real-time while maintaining the illusion of natural conversation is hard. We solved this with a compositional approach—pre-generated micro-state video clips (listening, talking, nodding) orchestrated intelligently, combined with aggressive caching. This gave us instant playback without sacrificing realism.


Accomplishments that we're proud of

  • Built a working prototype where you can genuinely talk to Elon Musk or Joe Rogan
  • Achieved sub-400ms video state transitions that feel seamless
  • Created a voice-first UX that feels intuitive (just hold spacebar and talk)

What we learned

We learned that the hardest part isn't the AI—it's making the experience feel natural and responsive.


What's next for GrokCast

  • Upload any video, talk to anyone
  • Multi-speaker conversations—imagine Elon, Jobs, and Bezos in a room with you
  • Voice cloning for authentic audio responses
  • Apple Vision Pro / Meta Quest experiences
  • Group conversations—you + friends + AI personas

The endgame: a world where every piece of human knowledge isn't just watchable, but conversable.

Built With

Share this project:

Updates