Inspiration
I was sitting in a karaoke bar in Tokyo, singing with friends, when I realized something: the person next to me was from South Korea, someone else was from India, and we were all singing the same Taylor Swift song. Yet we could have done this from home. We could have connected with millions more fans around the world. That moment stuck with me.
I started thinking about what happens when you combine three powerful things: global fandom communities, the magic of karaoke culture, and modern VR technology. Taylor Swift fans would love to sing together across continents. Anime fans crave the experience of performing opening themes with people who actually get it. K-pop stans dream of trainee studio vibes with global participants. Hollywood lovers want that concert hall energy from their living room.
Today, that experience doesn't exist. VRChat is generic. Songbird is a single-player rhythm game. Nothing brings together music lovers, characters, and culture in one place.
KaraokeVerse is that answer.
What It Does
KaraokeVerse is a multiplayer VR karaoke platform where fans sing together in themed worlds. Pick a room based on your passion: anime Tokyo lounge, K-pop Seoul trainee studio, Bollywood Mumbai rooftop, Hollywood LA concert hall, or Taylor Swift Broadway stage.
Core features in the MVP: Join up to 4 players in a room. Grab the virtual microphone with your hand and sing along to songs embedded via YouTube iframes. See lyrics displayed in real-time. Build a simple player profile and see other players in the room. It's lightweight, it's immediate, and it feels like community.
Future enhancements planning: real-time translation of lyrics, AI vocal feedback, cosmetic avatars, duet matchmaking, leaderboards. But for now, I'm focused on the core experience: singing together, no friction, pure connection.
How I Built It
This was a short sprint with tight deadlines. I started with the Immersive Web SDK and React because I wanted zero friction. No app store. Just a browser link on Quest (or testing on WebXR-compatible browsers on MacOS while I iterate).
Frontend runs on React + Three.js for basic 3D rendering and hand detection using WebXR Hand Input API. I kept gestures simple: grab the mic, start singing. Backend is Node.js with Express, and using PostgreSQL for storing player profiles, room state, and song metadata. No complex cloud orchestration, just straightforward APIs.
For songs, I embed YouTube iframes instead of dealing with licensing complexity. Players can search and select from a growing YouTube catalog of karaoke tracks. It keeps us agile.
Performance is decent on WebXR browsers. I'm testing on MacOS right now with a laptop, aiming for smooth frame rates without heavy optimization.
Challenges I Ran Into
Time constraints: Building in a week meant cutting features ruthlessly. I initially wanted AI feedback and live translation, but those moved to future scope.
WebXR hand detection: Getting hand tracking to work reliably across browsers was finicky. I debugged a lot with fake hand data and simulations on MacOS.
YouTube embedding: Getting iframes to play audio correctly in immersive context required some creative work with audio context APIs.
Database schema: Designing PostgreSQL tables for rooms, players, and song metadata quickly without overcomplicating it was a balance.
Accomplishments I'm Proud Of
- Built a working multiplayer karaoke MVP in a few days time using Immersive Web SDK (completely new to me)
- Got hand tracking working on WebXR browsers without needing an actual Quest device
- Integrated YouTube embeds for instant access to massive song catalogs
- Created simple, clean UX for room creation and player matchmaking
- Demonstrated viability of browser-based multiplayer VR experiences with PostgreSQL backend
- Built it fast enough to iterate daily and ship something playable
What I Learned
Moving fast forces you to make smart cuts. I ditched the complex stuff (AI analysis, real-time translation) and focused on the core loop: pick a song, sing, feel connected.
YouTube embeds are your friend for music licensing. No need to reinvent the wheel or deal with licensing complexity when YouTube already has the catalog and permission infrastructure.
PostgreSQL is more than enough for MVP-stage multiplayer games. You don't need NoSQL or fancy real-time databases if you design your schema cleanly.
Hand tracking is powerful UI magic, even in its imperfect forms. Players immediately understand grab and point gestures, even with occasional detection glitches.
A few days is not a lot of time, but it's enough to prove an idea works. The pressure forces you to focus on what actually matters.
What's Next for KaraokeVerse
Immediately: expand the song catalog and test with more users. Get feedback on the core experience.
Short-term roadmap: add real-time lyric translation so international players see subtitles in their language. Implement simple cosmetic avatars so players feel more present.
Mid-term: build the AI vocal coach that gives pitch and timing feedback. Add seasonal cosmetics and leaderboards.
Long-term: duet matching algorithms, live streaming integration, mobile companion app for non-VR friends to vote on performances.
But honestly, I will be taking this very rapidly since my hands our set now. The core mission is simple: bring music fans together around the artists and characters they love, in a space that feels like a concert, a fandom gathering, and a home studio all at once. Everything else builds from there.
Built With
- audio
- express.js
- html5
- immersive-web-sdk
- node.js
- postgresql
- react
- three.js
- web
- webxr



Log in or sign up for Devpost to join the conversation.