LoopJam is a Multiplayer Music Collaboration Tool
When testing, make sure to have another person testing at the same time from a different location
Inspiration
The music making process is more fun when shared with others! However, getting people into the same geographical location at the same time is difficult. As many musicians discovered during the Covid-19 pandemic, it is nearly impossible to make music together over an internet connection (such as over Zoom) without the latency inherent in live streaming causing musicians to desync. This means that to collaborate musically over the internet, you have to do all of your recording separately and then send files in for review and editing, which slows down the collaboration process.
To make a music tool that can work over the internet, the first problem to solve is how to ensure that everybody can hear the audio play at the correct time in the measure, which because of the limitations of latency essentially means you have to hear audio playback on a delay. This implies some kind of regularly repeating loop, onto which every sound is correctly aligned.
Greg and his wife Cecelia have been experimenting with analog looper pedals, which have several limitations. Only the more expensive pedals allow you to save more than one loop at a time, and on the cheaper pedals once something is added to the loop it is impossible to remove anything but the most recently recorded audio clip. This causes traditional loopers to run into a problem where once a looping arrangement is built up it becomes static and repetitive, without the ability to edit the different layers individually to create a more flexible arrangement out of recorded clips.
What LoopJam does
LoopJam allows you to record audio clips while saving the time the recording should start to be correctly aligned within the loop. Once a recording is finished, it is added to a list of clips that will be uploaded and distributed through the server to all other participants.
Upon receiving somebody else's recorded clip, you will also receive data about when the clip should start and end, so that it will playback in the same time as when it was recorded. This audio clip will display on our Loop (affectionately known as the Doughnut by our developers).
Touching an audio clip on the loop allows you to use a panel on your hand to adjust the volume, toggle between constantly playing the audio on a loop versus playing only when the clip is touched, and delete the clip or mute it if it's somebody else's recording. In particular, the ability to easily mute and unmute specific audio clips allows for a more dynamic looper experience.
The Loop model serves as an anchor point for the multiplayer scene. All other avatars are connected to the loop, which can be grabbed, moved around, rotated, and scaled to a more convenient location or orientation. In addition, we use scene data to display where a user is in their own environment through a portal that is placed around them. The scene data behind your friends gets special materials applied that pulse to the beat. If scene data isn't shared, we use a default environment.
Tools We Used To Build It
The most important tool we used is the Microphone built into every headset. While sometimes taken for granted on all our devices, the microphone is one of the most versatile music production tools available. Everybody can make noise, whether it's singing, beatboxing, slapping your furniture, or playing a more traditional instrument.
We used the Meta Presence Platform to implement both hand and controller tracking as well as to get access to Scene data on the Quest. In particular we wanted to take advantage of hand tracking, which makes the tool much easier to use with alternate forms of music production such as hand percussion and allows you to keep your hands free for other activities while you jam with your friends.
LoopJam uses the Normcore SDK for its multiplayer implementation. We had to create custom solutions for several features, in particular player avatar position and sending audio clips through the server.
Sending Audio Through A Server
The first challenge faced was how to record audio. In theory the best form of recording would be to receive data from Normcore's already existing audio stream, but for some reason while the data event worked fine on PC when the app was built to the Quest we got only a third of the data and the clips made us sound like chipmunks. This meant that we had to use the Microphone C# class to record our own clip which has some startup time and also interrupts Normcore's audio streaming, meaning others cannot hear you while you record and we have to manually restart the multiplayer voice chat after you finish recording.
Once the clip is recorded, we have to actually send it to other people in the server. To do this, we deconstruct the audio clip into an array of floats and begin the process of uploading the clip. Audio files are very large, with 48000 samples per second. When we first tried this out with multiple people, we found that having everyone uploading their clips at the same time first reduced our framerate to a slideshow and then just crashed the app, so we implemented a system that ensured that only one clip was being uploaded at a time, processed in order of when the recording was finished. Even then the amount of data we were trying to push would dramatically lag whoever was uploading. To accommodate this, we reduced the amount of data we were trying to send per frame from 500 floats to 200 floats, which made the framerate better but the upload time worse. We found some success assuming a baseline value of 0 for all samples and skipping all 200 sample chunks where the float values were below a low threshold, which sped up the upload time for audio clips where there were pauses and silence. Once the upload is finally finished an event is sent out for everyone else to rebuild the audio clip from the uploaded array.
Creation of the Loop (a.k.a. the Doughnut)
Implementing multiplayer in passthrough presents its own set of challenges. From the beginning of ideation we knew that there would be a shared zone where audio clips would be transferred. Our original format was a door shaped portal that would spawn in, and through that door you would be able to look into a digital recreation of your friend's room using their scene data. This door was to be the interface through which you would pass audio clips back and forth. This door was easy to reposition and take with you when moving from place to place in your room.
This worked great for two people, but as soon as a third person joined the room they would spawn in their own door, which would also have to be grabbed and moved around if you wanted to be able to freely walk around your space. Once you had three friends, you were essentially trapped at the door location unless you wanted to go through the ordeal of repositioning all three doors in a new place, and on top of that the doors ended up taking up a lot of visual space that blocked off our highly convenient passthrough.
What we needed was a way to be able to move all of the people in the room at one time to make it easier to take your friends with you throughout your space. Through discussions we settled on replacing the portal doorway with a single torus shaped Loop that everybody in the scene would be anchored to using their relative position, rotation, and scale to the Loop.
Grabbing and repositioning the Loop itself would move everybody with it, allowing you to take it with you wherever you wanted to go. What this meant in practice was that if a person flipped their Loop upside down, everybody else would see their avatar flipping itself upside down. As a byproduct of allowing users to scale the Loop up and down for their own convenience, the avatar's representation for other people would also grow and shrink based on their proportional size to the Loop, which we left in because it was fun and gave us spatial continuity, and also meant that we could high five each-other at any size.
The Loop also solved a problem of how to display large amounts of clips, as you would be able to see audio clips rotating as rings around the Loop. It's easy to scale the audio clip display for larger amount of clips by redistributing the clips with less and less space between them.
Scene Data and Stencils
We wanted to have a virtual representation of the room you were in so people could see you walking around your own space and understand how and why your avatar moved. When your local scene instantiates your scene data around you, we had a custom script instantiate multiplayer synced copies of those objects.
To avoid having everybody's rooms visually overlap with each other, we applied a material to the scene data that could only be seen when the owner's clientID was put into the stencil buffer for a pixel. Then we had each player instantiate their own object that would have a special material that would put their normcore clientID integer into the stencil buffer for everyone else. This started as a rectangular portal door but shifted to what we call an "avocado" shape that's attached to the head position
The scene data is composed of cubes and planes, and while technically we have the capability to add special models based on labels assigned to it by the user, we decided to instead go for a more abstract visual style where the walls would move to the beat. The combination of fruit-themed avatars and unique colored textures in the background made it easier to identify who was who in the application. In the future we would consider attaching the volume of the user's clips to move the background textures in an audio-reactive way instead of playing a premade material animation, and we could also take more advantage of user-defined volume labels to do more customization. Importantly, if the user chooses not to share their scene data, we have a default environment to use in its place.
Audio and Variable Framerate
For a long time the metronome we created was highly inconsistent due to resetting the timer to 0 when the loop retriggered instead of subtracting the expected loop length from the timer. The variability in frame rate (up to 50 ms per loop) meant that recordings started just before the loop point would quite often desync with recordings started after the loop point. It also made recording difficult as our beat timer used the same "resetting to 0" logic which exacerbated framerate timing issues.
Fixing the metronome timer highlighted related issues with playing back the audio clips under variable framerate conditions. We had automatically set the clip beginning and ending times by listening for the first and last sample that had a non-zero value. But because we were only checking to start the audio clip each frame, we were regularly chopping off the first transient of the loop, so we had to make sure to include at least 50 ms of silent lead in time in our clip data so that when we started the audio it would be ok if we started it a little late because of a late frame.
The variable framerate also impacted how often audio would loop, in particular for loops that filled up the entire recording time. Originally we made it so that the audio would not retrigger if the audio was still playing (to allow for clips longer than the loop time), but because of variable framerate the audio retriggering event might happen just before the clip ended, thereby missing the desired start point and waiting another loop before playing again.
This inconsistency of whether the audio would repeat or not would also make the waveform texture look as if it didn't match up with the playback. To solve this, when the play event happens we check if we are near the end of the audio clip, and if so we interrupt the audio clip to restart it. If this event happens, we tell our waveform texture to duplicate itself a second time to reflect the more frequent loops.
This interruption does cause an audible pop when audio is restarted that would be solved by switching between two different audio sources to allow the tail end of the audio to keep playing while a new loop starts but we ran out of time to fix it before submission.
USER INTERFACE
LOOP AS UI: The central visual element is the donut-shaped Loop interface where the recordings are visualized. This design not only maximizes the space for audio clip representation but also facilitates a communal interaction as users can surround the donut and interact from all sides. Users can rotate and scale the donut UI to better fit within their mixed reality environment. This ability not only helps in managing personal space within the MR environment but also enhances the user's interaction with the loops, making the experience more engaging and personalized.
AVATARS: Users are represented by fruit jam-themed avatars, adding a playful and creative visual element to the experience. These avatars can be seen by others, fostering a sense of community and shared space. Avatars in Loop Jam are not only playful and thematic but also functionally distinct through color coding. This feature provides a visual differentiation among multiple users, allowing each participant to be recognized by unique colors. Additionally, this color coding extends to the representation of recordings and the outlines of the hands, making it easier for users to associate actions and sounds with the correct participant.
MICROPHONE: A virtual microphone is attached to the right hand of the user's avatar. This intuitive placement allows for natural interaction styles such as singing, rapping, or instrument recording. The microphone is visually represented and easy to access, enhancing the user’s sense of control and engagement. The virtual microphone in LoopJam is always visible as a ghosted, semi-transparent version, ensuring that it's subtly present without obstructing the user's view. This visibility aids in reminding users of the tool's availability without overwhelming the interface. When a user clenches their fist, the ghosted microphone transitions to a fully visible state, and a recording circle appears. This visual cue is designed to intuitively indicate that the microphone is active and ready for use. The user can then position the microphone within the recording circle to initiate the recording process. This action triggers the recording of sounds or music, captured seamlessly within the immersive environment. This interaction with the microphone is intentionally straightforward. By reducing complexity, LoopJam allows users of all skill levels to engage with the music creation process effortlessly.
EDITING TOOLS: Located on the left hand, these tools include functionalities to delete recordings, adjust volume, and a unique 'play on touch' feature, which activates loops by touching them. This setup uses familiar gestural interfaces adapted to MR, making it intuitive even for new users.
UI BUTTON PLACEMENT: UI buttons are intentionally placed along the main nerve branches (ulnar, median, and radial nerve branches) on the left hand. This strategic placement is designed to maximize haptic feedback, ensuring that every interaction with the buttons is not only tactile but also remarkably intuitive. By aligning the buttons with these nerve pathways, users experience enhanced sensations that make the interface both physically and emotionally engaging. This design detail significantly enriches the user experience, making interactions more vivid and memorable.
USER EXPERIENCE
GUIDED INTRODUCTION: At the outset of the LoopJam experience, users are welcomed with an intuitive onboarding process. This includes clear, concise text instructions coupled with 'ghosted' hand animations that demonstrate how to interact with the various UI elements.
MULTIPLAYER PASSTHROUGH EXPERIENCE: This feature embodies a significant part of LoopJam's UX. Users can see each other’s real-world spaces scanned and represented through avocado-shaped portals. This integration of real and virtual worlds not only enriches the visual experience but also deepens the users' connection to each other’s creative environments.
INTERACTIVITY: The direct manipulation of sound loops through touch and the intuitive placement of tools ensure a highly interactive experience. Users feel more connected to the music creation process as they can touch and modify the sounds within their virtual space.
SOCIAL AND COLLABORATIVE FEATURES: The entire setup of LoopJam is designed to promote collaboration and social interaction. The visibility of avatars and shared spaces encourages users to interact not just with the UI but with each other, fostering a collaborative and enjoyable musical creation process. The microphone's functionality, where users activate it by making a fist and placing it into the recording circle to start capturing sounds, is a key enabler of collaboration. It simplifies the process of adding layers to a communal music piece, allowing users to seamlessly build upon each other's contributions. This straightforward mechanism ensures that all participants, regardless of their technical proficiency or musical expertise, can engage equally and effectively.
AESTHETICS AND THEMING: The playful and colorful theme of fruit jams not only makes the UI appealing but also helps lower the barrier to entry by making the technology feel less intimidating and more inviting to users of all ages and backgrounds. The color palette in LoopJam is carefully chosen to enhance visual harmony and ensure clarity in the user experience. Bright, distinct colors and playful textures are used to differentiate user avatars and their corresponding inputs, aiding in navigation and recognition across the collaborative platform. These colors and textures are not only functional but also add a layer of aesthetic pleasure, making the interface delightful to interact with.
Next Steps for LoopJam
- Greater editing capabilities for the audio clips, including the ability to make small adjustments to timing to better align with other recordings
- Storing and retrieving audio clips from your headset's local storage space
- Storing and saving audio performances and sessions in a replayable way
- Displaying currently selected audio clip waveform in a more legible way on your hand UI
- Being able to switch to other multiplayer rooms
- Avatar customization (instead of your avatar, color, and background being assigned to you based on the order in which you joined)
- Boxes that allow you to apply digital signal effects to audio playback such as distortion, echo, or reverb
- Audio reactive visuals for your environment and inside other user's scene data backgrounds, or otherwise having audio clips drive visuals
- UI Recording options for instruments that have to use both hands (such as speech to text controls)
- Allowing for continuous recording without pauses
- More avatar-to-avatar interactions (like high fiving)
- Better management of how much visual space large users take up (to avoid accidentally being blinded by someone being on top of you)
- Multiplatform access (such as from a web browser or mobile phone)







Log in or sign up for Devpost to join the conversation.