Fuzzy Fusion | Devpost

GIF
AI generated "Burger"
GIF
AI generated "Corgi wizard holding a wand"
GIF
AI generated "Floating Castle"
GIF
AI generated "UFO"
GIF
AI generated "spaceship"
GIF
AI generated "sushi"
GIF
AI generated "three eared panda"

Inspiration

The inspiration for this project came from the contour design on the gift bag design of UofTHacks-X. A contour is created by removing a dimension from a 3D object, which got us thinking, is it possible to insert a dimension into an image to make it 3D?

Dall E AI wowed the world with text-to-image generation earlier this year, and in this project, we want to break the bound of 2 dimensions and generate a 3D object using text as the latest breakthrough in neural radiance field (NeRF) and image generation from text made this possible.

What it does

Our site uses a text-based input to create a 3D textured mesh using stable diffusion techniques and uploads it to Google Cloud Storage for fast and secure storage. We are also open to the idea of trading these AI-generated objects as NFTs in the future.

How we built it

The full-stack web app was developed using node.js and JavaScript. The JavaScript backend calls accept user inputs and augment them using cohere's language generation algorithm to create a more detailed prompt for the object generation algorithm to generate more specific objects. After the object is generated, the mesh and the texture files are rendered using WebGL and three.js so the user can manipulate it directly. Furthermore, the object is uploaded to Google Cloud Storage with a URL and can be downloaded by the users.

The 3D object generation algorithm was implemented in Python and CUDA based on dreamfusion's text-to-AI algorithm. The network uses stable diffusion techniques to generate multiple 2D images based on text input and fuses these 2D text images using NeRF (neural radiance field) to create a 3D object.

Challenges we ran into

We encountered numerous challenges with running the 3D object generator because the stable diffusion algorithm requires a humongous amount of processing required to turn a blurry fuzzy point cloud into an object. For example, the RTX 3090 GPU with 24GB of VRAM needed to run for 45 minutes straight to generate a mesh like the hamburger in high resolution which takes too long for the demo. So we had to tweak the model parameters such as learning rate and iterations to find the best balance between time and quality to keep the generation time under 15 minutes.

Accomplishments that we're proud of

We successfully learned and connected many new technologies in this project, from stable diffusion for object generation, to object rendering using three.js and WebGL, to decentralized storage and language generation models, we came a long way.