Inspiration

Reading a manual is hard, especially when images don't always make the most sense. However, guides are a lot easier to follow when more information (like an extra dimension!) is available. 3Docs makes repair more accessible by displaying manuals intuitively in ways that can "make it click" for more people.

What it does

3Docs is a platform that transforms boring old PDF manuals into interactive 3D manuals complete with voice guidance and PDF references for each step of a process.

How we built it

  • Frontend: Built with React (Next.js). We utilized the Three.js to render the .glb files directly in the browser.

  • Backend: We used Python to manage API calls, caching, and sqlite database queries & insertions. To determine which images and instructions were useful and paraphrase instructions, we used Gemini 3 Pro Preview and Gemini 2.5 Pro, and we then matched paraphrased instructions to the loaded PDF to synchronize the user experience. To generate 3D models from images, we attempted several models, but ended up using the Tripo3D API; to generate audios for each instruction we used Fish Audio's API.

Challenges we ran into

  • Currently available Image to 3D model models are kind of not great, so we had to use a private model locked behind an API.
  • Older models of Gemini (like 2.5 Pro) are bad at determining image content and usefulness
  • Live audio processing and API calls for real-time generation of audio has too high latency to work well

What's next for 3Docs

There are some features that 3Docs still lacks, such as input sanitation and live conversation. We also could benefit from sourcing a better and open source Image to 3D model model.

Built With

+ 5 more
Share this project:

Updates