SplatMaps | Devpost

Inspiration

As college students, we love traveling and exploring the world. Unfortunately, travel often tends to be costly and time-consuming, making it difficult for us to tick off items on our personal bucket lists. Coming from a background in machine learning and 3D reconstruction, we were interested in solving this problem by providing immersive, photorealistic travel experiences to anyone through the Oculus VR headset. We were familiar with the concept of Gaussian Splatting, a recently developed 3D reconstruction technique that enables high-fidelity rendering of a scene from any view, just from a video. Hoping to bring this technology to tourism, we wanted to create an AI-native hub that would allow people to share their travel experiences.

What it does

SplatMaps is an open-source content sharing and consumption platform for 3D gaussian splats, where users can upload videos of their favorite travel spots and immerse themselves in rendered Gaussian splats from the community.

How we built it

Under the hood, SplatMaps leverages 3D reconstruction libraries for Gaussian Splat training and WebXR for headset-compatible frontend. More specifically, we make use of NerfStudio to train Gaussian Splat models from a given input, by first extracting camera poses from a monocular video via COLMAP, and then training to optimize the gaussians over 30,000 steps. This training pipeline takes approximately fifteen minutes per scene and enables high-fidelity reconstruction, even from out-of-distribution camera poses. The trained model is extracted to a .splat file, which is stored in an AWS S3 bucket. For frontend, we have a Vercel server that runs both backend and frontend; for our landing page, we reference thumbnails and captions from metadata files stored on S3 to load a scrolling waterfall with all the available splats. When a splat is clicked on, we load the desired gaussian splat, and, critically, also load a starting pose, such that the viewer is initialized facing the correct direction. Using an open-source, WebXR compatible gaussian splat renderer, we begin rendering the gaussian splat (built on AFrame, a WebXR compatible 3D rendering engine built on 3.js). To enable interactivity, we tap into AFrame’s Oculus API to receive callbacks for moving in the gaussian splat, as well as for calling our AI tour guide. To implement the tour guide, when the user would like more information about a given scene, we take a screenshot of the current frame, and pass it into Claude, along with a predefined prompt. The return from Claude is then shown on screen for the user to read.

Challenges we ran into

On the gaussian splatting side, we ran into challenges with COLMAP. Due to the rain during CalHacks, the video we captured had water droplets, which made it more difficult for COLMAP to extract camera poses for each frame. As a result, not all of our recorded videos were converted into gaussian splats. We experimented with LiDAR recording, and although successful, felt that the idea of supporting monocular video was more exciting to pursue. We also had trouble converting

One of the main trade offs we faced was choosing between a static site and a hosted site for the web app. The static site was faster and more efficient, which was especially useful since we wanted a high frame rate for the VR platform, but the hosted site was much more easily able to do integrations with Claude and the Oculus headset. In the end, we decided to use the hosted site for added features. We also faced difficulties with integrating multiple service across a diverse array of platforms. Managing multipurpose frameworks for full stack, VR dev, cloud computing and storage served as a challenge but was extremely rewarding to learn to integrate. Interfacing with hardware using AFrame in a way that supported many devices created complexities in how to control and move around in the world, whether it’s VR or click and drag.

Accomplishments that we're proud of

We managed to make a working marketplace for 3D experiences using cutting egde 3D reconstruction techniques. This is open-source, extensible, usable for multiple modalities and only needs a single camera to make 3D scenes with research level accuracy. As AI researchers, we took inspiration from top Berkeley AI labs and made it accessible for even people with just a phone to contribute content and engage with the platform.

What's next for SplatMaps

In creating SplatMaps, we've realized that the streamlined pipeline we've set up for Gaussian Splat training, rendering, and AI-based explanation can be applied to fields beyond just tourism. Since this framework is highly extensible, we envision it being applied to fields such as rendering product previews while shopping on a PC or VR headset, augmented reality systems to place digital asset overlays in a real-world environment, and for commonly challenging scenarios, such as hunting for a new house. In all of these situations, the ability to visualize a high-fidelity recreation of a scene would be invauable, saving time and creating a new way of interacting with our surroundings