DreamScapes | Devpost

Logo
System Design Diagram
Demo Screenshot

Inspiration

Growing up, we shared a common interest: Wanting to explore the bounds of our imaginations with no limits. When we were younger, we all were able to express this through a very familiar game to many called Minecraft. However, with very limited block choices, we weren't able to completely make what we wanted, often substituting it with the next best thing we could make with blocks. This inspired us to make a Sandbox Game where all the assets are completely generated by AI. We saw the value in creating a tool to help VR developers prototype faster so we jumped at the opportunity to create an AI powered prototyping tool.

What it does

Dreamscape allows users to use voice commands and generate 3D meshes just by speaking out keywords. This voice command goes to our backend as text where the text is transformed into a 3D mesh through our AI-driven pipeline powered by two AI models. The server then takes around ~30 seconds to generate and return a 3D mesh that the user can use. Users are then able to generate more assets, driven just by their unlimited imagination.

How we built it

We built a powerful AI-driven pipeline using multiple cutting-edge technologies:

Frontend: Unity-based VR environment for the Meta Quest 3
AI Generation:
- FLUX.1-schelle API for text-to-image generation
- Locally-hosted TripoSR for image-to-3D mesh conversion
Backend:
- FastAPI server for handling requests
- Redis / AWS MemoryDB caching for storing pre-generated assets
- AWS S3 Buckets for .obj file storage and delivery
Real-time Integration:
- Runtime object exporter

We made use of two open-source AI models from hugging face to generate an image from a text prompt, and then a 3d mesh from the image. Firstly, we used the FLUX.1-schnell model to generate an image from a keyword. The image result is then pipelined into another model, TripoSR, which is downloaded fro hugging face and locally-hosted on our machine. This is a image-to-3D AI model that returns a 3D mesh from an image. This generated 3D mesh is then brought into the user's world on the Meta Quest 3 in real time. We also used FastAPI as well as Redis caching in order to send pre-generated assets that were previously requested from the endpoint. In addition, we used AWS S3 Buckets in order to send .obj URL’s to the Unity frontend in order for them to be imported real-time.

Challenges we ran into

Live Importing

The biggest and most important part of our entire project was leveraging Live Importing. We ran into many issues that hindered our performance in developing overall. It would constantly crash because Unity did not support a high-number of vectors for an AI generated 3D mesh. In order to solve this, we imported a 3D model decimator which reduced the number of vectors in the mesh as well as made the model smoother. We had tried to reduce our file size, and we tried importing locally, but Unity wouldn’t import live until we smoothed the models. The models were fine and imported normally. Through the experimental debugging process we learned that fuzzy models cost more memory for Unity to build.

Implementing Vector Searching

In layman's terms, Vector Searching is using the distance between meaningful points in space to figure out the closest similar model to whatever model you are looking for. If you wanted to generate a “Poodle” and a “Labrador”, they are both Dogs so they would return the one that already exists within the database. We had a lot of challenges with generating compatible vector embeddings and dialing in the VectorQuery ranges.

Accomplishments that we're proud of

3D Mesh Generation Speed

Our biggest accomplishment was the speed of our 3d mesh generation. We developed a pipeline that brought together two open-source Hugging Face models to go from text -> image -> 3D mesh. The reason we did not use models that had direct Text-to-3D capabilities were because (1) most open-source Text-to-3D models were unreliable and (2) the good and reliable models were behind a paywall. The pipeline that we implemented on our own was able to generate above average quality of 3D models and achieve this process in an average of an astounding 30 seconds, which completely blew our expectations, especially because we were running the model locally. This allowed us to create a truly immersive and responsive world-building experience, where users can see their ideas come to life in 3D almost as fast as they can think them up.

Redis Database

We stepped up our game by adding vector searching with Redis. We converted our 3D models into vector embeddings and stored them in Redis. Now, when a user drops a prompt, our system quickly searches the Redis database, does some vector math, and pulls up similar models in no time. This means we can instantly show models that are close to what the user wants, even if it's not a perfect match. It's like having a smart search that gets what you're after, even if you describe it weirdly. This Redis setup keeps our app fast and gives users quick results when they're trying to bring their ideas to life.

What we learned

Our project was an exploration of the intersection between XR and AI. We discovered XR's applications extend far beyond gaming, reaching into areas like healthcare, education, and business. For instance, XR could revolutionize nutrition tracking with AR interfaces, transform entertainment with immersive experiences, and enhance productivity through virtual assistants. In education, XR offers interactive learning environments, while in healthcare, it provides innovative tools for training and treatment. The technology's ability to blend digital and physical worlds presents exciting opportunities for retail, architecture, and remote collaboration. This exploration revealed XR's capacity to improve efficiency, engagement, and user experience in numerous aspects of daily life, highlighting its bright future and potential for continued innovation.

What's next for Dreamscapes

We believe that this product could also innovate on AR Experiences. One of our biggest ideas is to detect the floor of an empty room and design an entire interior for it. Using ground plane detection, users could scan their space and have our AI generate a complete interior design based on their preferences. They could place and manipulate AI-generated 3D furniture models in real-time, experiment with different wall colors and lighting, and create virtual walkthroughs of their newly designed space. This AR integration would bridge the gap between imagination and reality, allowing users to not only generate 3D models but also see how they fit in their actual living spaces. It could revolutionize home decorating, interior design planning, and even real estate staging, offering a powerful tool for both professionals and enthusiasts.

AI-Generated Skyboxes

Defining a sky box and lighting are key to setting an ambiance. AI-generated skyboxes are easy to implement and would smoothly integrate in our voice to 3D model generation pipeline. This integration would open up our product to game developers and event planners.

Full Map Generation

Another exciting future goal is to generate entire scenes or maps, not just individual objects. We aim to develop a system where users can describe or sketch complete landscapes, cities, or multi-level structures, and see them come to life. This would involve scaling up our AI generation pipeline to handle larger, more complex environments, potentially integrating procedural generation for terrain and urban layouts. We'd need to optimize performance and ensure logical placement of elements across vast areas. The ultimate goal is to create a sandbox where users can manifest entire worlds straight from their imagination, unleashing unlimited creativity and exploration. This feature could revolutionize gaming, virtual reality training, and interactive storytelling, putting the power of world-building directly into users' hands.