Grok Rap Battle!

Home
Loading
Rap Battle

Inspiration

We've all seen those hypothetical "who would win" debates online—Elon vs Sam, east coast vs west coast, your boss vs your coworker. We thought: what if AI could actually settle these beefs? Rap battles are the ultimate arena for verbal combat, and with Grok's new voice and image APIs, we realized we could make anyone battle anyone. The idea of hearing your friend roast their rival in Kendrick's flow was too good not to build.

What it does

Grok Rap Battle turns any two people into AI-generated rap battlers:

Input: Pick two fighters, upload voice samples, optionally link their X/Twitter handles
Lyrics: Grok scrapes their social media for personality traits and generates personalized diss tracks
Voice Cloning: ElevenLabs combines their voice identity with a rapper's cadence (Stormzy, Eminem, Drake, etc.)
Audio: Grok Voice API generates the actual rap performance with the cloned style
Beat: Grok generates a matching instrumental and mixes it with vocals
Video: Grok Image API creates storyboard frames, Runway generates video, Sync Labs adds lip sync
Output: A complete rap battle video ready to share

How we built it

Backend: FastAPI + Python with async pipeline orchestration
Frontend: Custom 8-bit arcade-style UI (Street Fighter vibes) + Gradio for rapid prototyping
Voice Pipeline: ElevenLabs speech-to-speech for style transfer (celebrity mode with pitch shifting to bypass detection), Grok Voice API for final generation
Audio: Custom beat generator using pydub, BPM detection with librosa, ffmpeg for mixing
Video: Grok Image API for storyboards, Runway for image-to-video, Sync Labs for lip sync
Context: X/Twitter API integration for personality-aware lyrics
Real-time Progress: Server-Sent Events (SSE) for live pipeline status updates

Challenges we ran into

Voice Detection: ElevenLabs blocks celebrity voices. We built "celebrity mode" that pitch-shifts audio down before cloning, then reverses it after—effectively bypassing fingerprint detection while maintaining voice quality.
Timing Sync: Matching rap vocals to beat tempo required creating and detecting BPM from generated speech and dynamically generating beats to match, not the other way around.
Pipeline Complexity: 12 stages across 5 different AI APIs (Grok text, voice, image + ElevenLabs + Runway + Sync Labs). Managing failures, fallbacks, and progress tracking across all of them was a beast.
Style Transfer: Getting the rapper's cadence without their voice required chaining voice cloning → speech-to-speech transformation → pitch correction.

Accomplishments that we're proud of

End-to-end generation: From two names to a complete lip-synced rap battle video with limited intervention
- Voice style transfer: You actually sound like yourself rapping like Kendrick
- The UI: An 8-bit arcade cabinet interface that makes AI feel nostalgic and fun
- Celebrity mode: Cracked the ElevenLabs voice detection and Runway video generation with workarounds.
- Real-time feedback: Watch your battle get built stage-by-stage with live progress updates

What we learned

Grok's voice API is really good at expressive speech—it captures rap cadence better than expected
Chaining multiple AI services requires serious error handling and fallback strategies
Voice cloning ethics are complex—there's a reason these protections exist
Building for fun (rap battles) actually pushes technical boundaries harder than "serious" applications
The best demos are the ones that make people laugh

What's next for Grok Rap Battle

Multi-round battles: 4+ verse exchanges with escalating intensity
Tag team battles: 2v2 showdown of your favourite people
Custom beats: Let users upload their own instrumentals or generate from prompts
Battle templates: Historical figures, fictional characters, meme formats

Built With

eleven
fastapi
gradio
grok
javascript
python
runway
sync

Updates

Michael Zhou started this project — Dec 07, 2025 03:29 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.