3Duroam | Devpost

Rox Agent Network Diagram

What it does

Generates immersive VR experiences geared towards education. Imagine learning history as if you were living it—walking through the 1700s alongside George Washington! Picture exploring the rich culture of Korea or uncovering the hidden intricacies of farm operations. With our platform, your imagination becomes reality.

By simply entering a prompt, our advanced pipeline—powered by LangChain AI agents, NVIDIA LLMs, Perplexity search, and LumaAI—conducts in-depth research to create a detailed, engaging, and accurate learning experience. Within this immersive world, we integrate ElevenLabs' conversational AI, allowing real-time interactions with a personal AI guide, so you can ask questions and deepen your understanding.

A fully personalized, interactive way to explore and learn—wherever your curiosity decides to roam!

Inspiration

In a world of advancing technology and shrinking attention spans, education is a field still built largely around passive learning--textbooks, lectures, and coursework that struggle to keep students engaged and passionate about their learning. Our team wanted to reimagine the future of education, a world where learning is a highly personalized and immersive experience. We wanted to allow students to fully exercise their curiosity and make learning feel like an adventure. With the recent developments in generative AI, we felt that endless possibilities were opened up in the field of education and children could benefit from a more fun and visual representation of their coursework.

That's why we built 3Duroam, an imaginative educational platform adapted towards each individual user. Our goal is to transform the way students engage with knowledge, moving beyond passive memorization and allowing for active exploration of important topics. We believe that curiosity should be nurtured, not stifled, and that every student deserves an experience tailored to their unique way of learning. By reframing education as a journey of discovery, we aim to inspire the next generation to think creatively and develop a lifelong passion for learning.

How we built it

For the development process, we started by creating our AI Agent network, as described further in detail in the Rox write up section. With this network, we were able to integrate NVIDIA LLMs, Perplexity Sonar Pro, and Luma AI to transform natural language outputs to detailed AI-generated images. We decided to use these tools since they fit our use case extremely well.

The VR environment was developed using Unity and the Meta Quest SDK, with careful attention paid to optimizing performance for mobile VR hardware. We implemented custom shaders and lighting systems to ensure the generated visual content could be rendered efficiently while maintaining high visual quality. The application's architecture was designed around a modular system that allows for easy integration of new educational content and features.

Our conversational AI system utilizes ElevenLabs' API, which we enhanced with a custom audio processing pipeline to handle the hardware-specific challenges of the Meta Quest 2. We implemented a real-time audio conversion system in which users could ask questions to get answers—which happens to be great for learning and exploring!

Rox Write Up + Diagram

To go more in depth about our AI Agent network, our team developed a sophisticated multi-agent system with LangChain/LangGraph that transforms natural language queries about historical events into immersive virtual reality experiences. The pipeline begins with an information AI agent powered by NVIDIA's Nemotron model that processes natural language queries to identify key historical moments given a prompt from our speech-to-text system. It then communicates with the context agent and verification agent. If, for example, the user gives a generic prompt, the information agent might need more context to generate the key events and details needed for our visual rendering later on. That illustrates the role of our context agent. We also have a specialized verification agent that communicates with all of our natural language LLM agents to verify the information generated. It does this using Perplexity's Sonar Pro API, which has the ability to crawl websites to fact check that information against the information generated by our initial information agent. After the information (and additional context) is generated and verified to be accurate, it is passed into our visualization agent, which generates prompts for each of the events/details that were created earlier in the pipeline. These prompts are then passed into our media generation agent, which takes the prompts and uses the Luma AI API to create videos of the scenes that the prompts describe. Those videos are then sent to the Meta Quest 2 VR headset and virtually mapped so that users can experience the immersive world. This system makes use of prompt orchestration by passing contextually-relevant information through a chain of specialized agents, in which each builds upon the previous one's output while maintaining semantic coherence throughout the pipeline. Context management is another key strength of our system; the HistoricalState class we created maintains a context object that evolves throughout the pipeline—which tracks not only the conversation state but also metadata about visual prompts, fact-checking results, and generation status.

In what would typically require multiple humans like historians, fact-checkers, visual artists, and VR designers, we created using LLMs and an advanced network of AI Agents using natural language and generation. We have also attached a diagram for further visualization! :)

Challenges we ran into

Working with VR was definitely a challenge as most members of our team had never developed on the headset before. It was difficult managing UI elements and navigating some hardware-specific features, especially audio. A big part of our project is the integration with ElevenLabs's conversational AI, but in order to communicate we had to modify our audio input as the Meta Quest 2 headset naturally takes 48000 Hz audio in its mic but ElevenLabs's API was set to process 8000 Hz. We spent a lot of time trying to properly capture audio through the Oculus and convert it so a format that could be picked up and registered by ElevenLabs.

We also ran into challenges in the realm of 3d generation. Our idea was very ambitious and we experimented with various different cutting edge models, spending a lot of time training/deploying them. Ultimately, we decided to keep the 3d generative AI as a future alternative as the models we were able to generate took a lot of time and weren't accurate enough to be fully viable in the app.

Accomplishments that we're proud of

Our team feels proud of how much we were able to learn this weekend. We set out with a very ambitious goal and were able to come out with a very satisfactory product. Despite many challenges along the way, we were able to adapt and pivot and create something that is meaningful to us. We solved many technical difficulties and were able to persevere when it seemed like some of the things we wanted to create weren't possible. Overall, we feel like we were able to challenge ourselves while still having fun and were able to develop a pretty cool app.

What we learned

This was a great learning experience on how to learn. Our entire team was mostly unfamiliar with VR software and most had never used it in the past. Throughout this hackathon we had to be efficient in planning out our workflow and learning challenging technologies on the spot. Our team had to navigate the use of novel AI agents, complex generative AI models, and foreign hardware challenges that forced us to adapt and pivot as we built out this project. We were able to grow a lot as developers, learning new skills in 2d and 3d image/asset generation, audio capture and conversion, and AI agent workflows.

What's next for 3Duroam

Throughout our development, we extensively experimented with various NVIDIA AI models and research papers—from Edify-3D and Edify-360-HDRI to Instant Neural Graphics Primitives and Instant Splat. Our goal was to push the boundaries of generating immersive, personalized worlds. While we found some success leveraging NVIDIA computing, such as training models with Brev to generate 3D meshes, we weren’t fully able to bring these worlds to life.

Moving forward, we’re excited to bridge that gap. With AI and computing capabilities advancing rapidly, we've been truly inspired by what’s possible. Our vision for 3Duroam is to develop a seamless pipeline that transforms text into interactive 3D worlds—whether for recreating historical events, simulating real-world locations like Italy or Tokyo, or simply for fun. Ultimately, we aim to be the go-to platform for education and exploration. Let's 3Duroam the universe together!