VisuWorld | Devpost

Visualizing Screen
Logo

🌎 What Were We Thinking?

We were hard at work figuring out what to build for Bitcamp, and after spending nearly a third of the competition in the ideation stage, we came up with VisuWorld. After combining some of the cool new tech we've explored - GLSL / Graphics, Google Gemini, and Voice to Text - we built VisuWorld!

🤔 What is VisuWorld?

Embarking on your own VisuWorld exploration begins with a speech to text prompt to Google's Gemini API, which generates GLSL code to represent just about anything that you can articulate out loud. Equipped with over 25,000+ graphics shader snippets in a RAG database, we've been able to see some truly impressive 3D visual landscapes over the course of this weekend.

⚡ Visualize This:

The pipeline for VisuWorld is fairly simple, and can be broken down into 3 main steps:

First, we needed to get our data formatted. We scraped upwards of 34,000 code snippets from Shadertoy, a public site with general API access. We then embedded it all using OpenAI's embedding models to a RAG database using ChromaDB to make Gemini even more powerful.
Using the native Google Chrome Web Speech to Text, we then let the user speak their thoughts aloud, separating them from manual/dexterous limitations, and truly letting their imagination take over. Once done with a few commands, Google Gemini (both 2.0 Flash, and 2.5 Pro) utilized our RAG knowledge system along with its own training data to generate GLSL code. Users can generate new VisuWorlds, or use their current one as context to Gemini to explore every avenue of their imaginary worlds.
Once the GLSL code was generated, we utilized React & Next.js to display a seamless frontend, paired with Three.js to help us handle the WebGL GLSL shaders with ease. We also used React to make the application more accessible over a web interface, and built a really awesome Gallery for everyone to display how far their imagination can take them.

⛰️ Jagged Cliffs...

Working with Gemini on GLSL code was really difficult at first, and no amount of prompt engineering was going to save us from diving deeper. This was all of our first times using RAG to empower LLMs, but we are sure glad we did. Even from only 500 code samples, having an information system for querying made Gemini MUCH better at making worlds that will make your jaw drop.

🏆 Our Victories

Three Dimensional generation is a hot topic in the AI space, with 2D to 3D diffusion models being a popular choice of exploration. However, we haven't seen many things quite like VisuWorld going from text to 3D landscapes. The "eureka" moment of realizing that graphics shaders were a valid surface area of innovation was thrilling to stumble across.

🚀 Unexpected Discoveries

A whole lot about graphics. Like, a LOT. Who knew how much math went into these things, and just how intensive inefficient graphics can be. Our laptop fans blew louder than any of us imagined they could.

🌅 New Horizons

We want to spend time perfectly optimizing every piece of the VisuWorld stack in hopes of deploying it for real. Most of the intense computing is offloaded to the web client, so we can (and will) be able to hold many users at a time. We want to share this experience with others, because it really is that cool to look at and a great time to explore.