MindScape AI - Project Description
Inspiration
I've always struggled with memorizing things. In college, I'd spend hours making flashcards, highlighting textbooks, and re-reading the same paragraphs over and over. A week later? Gone. It was frustrating because I knew I was putting in the work, but my brain just wasn't holding onto the information.
Then I stumbled across this ancient technique called the "method of loci" or memory palaces. The idea is simple: your brain is incredibly good at remembering places. You can close your eyes right now and walk through your childhood home in your mind, remembering where everything is. So instead of trying to memorize abstract facts, you imagine placing them in different rooms of a familiar building. It sounds weird, but it actually works—memory champions use this technique to memorize thousands of digits.
The problem? Building a memory palace manually takes forever. You have to create all the associations yourself, imagine the space, place each piece of information carefully. It's powerful but completely impractical for everyday learning.
That's when it hit me: what if AI could do the heavy lifting? What if you could just upload your study material and instantly get a fully-formed, immersive memory palace custom-built for that content? That's the spark that became MindScape AI.
What it does
MindScape AI transforms any PDF into an immersive 3D memory palace that you can explore in your browser.
Here's how it works: You upload a PDF—could be a textbook chapter, research paper, study guide, whatever. Within seconds, Gemini analyzes the content and extracts the key concepts and their relationships. Then Imagen generates a unique visual representation for each concept. Not generic clip art, but something meaningful—a glowing chloroplast for photosynthesis, a golden throne for a historical figure, a molecular structure for glucose.
These visuals are then placed in a beautiful 3D environment that matches your content's theme. Studying biology? You're in a tranquil garden. History? A grand palace. Physics? A modern laboratory. You can walk through this space, explore the concepts, and hear AI-generated narration explaining each one. There's even ambient music composed by Lyria that helps you focus.
The magic is that your brain naturally creates spatial memories. Instead of trying to remember "photosynthesis happens in chloroplasts," you remember "the glowing green thing was near the fountain in the garden." It's the same reason you never forget how to navigate your house, even in the dark.
How we built it
The entire application runs on Google Cloud Run, which was perfect for this because we needed something that could scale instantly and handle parallel AI processing without managing infrastructure.
The backend is a FastAPI service deployed on Cloud Run that orchestrates everything. When you upload a PDF, here's what happens:
Content Analysis: Gemini 2.0 Flash reads the PDF and extracts key concepts, their relationships, and determines the best environment theme based on the subject matter.
Parallel Asset Generation: This is where it gets interesting. We fire off multiple AI requests simultaneously:
- Imagen 4 creates unique visual representations for each concept
- Cloud Text-to-Speech generates narration for each concept
- Lyria composes ambient music that matches the environment theme
- Veo2 for video generation for the demo.
Spatial Placement: A custom algorithm positions each concept object in 3D space, ensuring they're not too close together and that related concepts are near each other.
Caching: Everything gets stored in Cloud Storage so subsequent loads are instant.
The frontend is built with React and Three.js for the 3D rendering. We use WebSockets to stream the Lyria-generated music in real-time, and the camera smoothly transitions between concepts as you navigate.
The whole thing is serverless. No servers to manage, no infrastructure to worry about. Cloud Run handles the scaling automatically, and we only pay for what we use.
Challenges we ran into
The Imagen Resolution Reality Check: We originally planned to generate 2K and 4K textures for ultra-high-quality environments. Turns out Imagen 3 doesn't support custom resolution parameters—it generates at a fixed resolution around 1024x1024. We had to pivot our expectations and optimize for what was actually possible. The textures still look great, but it was a good lesson in validating assumptions early.
Generic Objects Problem: Our first approach generated generic environment objects (fountains, benches, thrones) and tried to associate concepts with them. It looked... weird. A concept about glucose metabolism next to a random fountain? It didn't make sense. We completely redesigned the system to generate unique objects for each concept instead. This "concept-driven" approach is much more meaningful, but it meant rewriting a significant portion of the backend.
Parallel Processing Complexity: Generating 10+ images, audio files, and music simultaneously while managing rate limits and failures was tricky. We had to implement proper async/await patterns, error handling, and fallbacks. If one concept's image fails to generate, the whole experience shouldn't break.
Three.js Learning Curve: None of us were Three.js experts going in. Creating smooth camera transitions, proper lighting, and making sure the 3D objects looked good took a lot of trial and error. The documentation is great, but there's no substitute for just building and iterating.
Time Constraints: We started with an ambitious spec and quickly realized we needed to prioritize. We focused on getting the core experience right—PDF upload, concept extraction, environment generation, 3D navigation—before adding bells and whistles.
Accomplishments that we're proud of
It Actually Works: This sounds basic, but we built something that genuinely helps with learning. I tested it with a biology textbook chapter, and I can still remember the concepts days later by visualizing where they were in the garden. That's the whole point, and it's incredibly satisfying when the theory becomes reality.
Multi-Model Orchestration: We're using four different Google AI models (Gemini, Imagen, Lyria, Cloud TTS) in a coordinated workflow. Each one does what it's best at, and they all work together seamlessly. The parallel processing means you get your memory palace in seconds, not minutes.
Serverless Architecture: Everything runs on Cloud Run with zero infrastructure management. It scales from one user to thousands automatically. We can deploy updates in minutes. This is exactly what serverless is supposed to enable, and it feels great when it works this smoothly.
The Pivot to Concept-Driven: Recognizing that our initial approach wasn't working and completely redesigning it mid-hackathon was scary, but it was the right call. The concept-driven approach is so much better, and I'm proud we had the courage to throw away working code and start over.
Beautiful Environments: The AI-generated environments actually look good. The garden is peaceful, the palace is majestic, the library is cozy. Imagen 3 did an amazing job with the textures and skyboxes.
What we learned
Validate Assumptions Early: We spent time planning for 2K/4K textures before checking if Imagen actually supported that. Always verify what the APIs can actually do before building around assumptions.
User Experience Trumps Technical Complexity: The concept-driven approach is simpler technically but way better for users. Sometimes the "clever" solution isn't the right solution.
Serverless is Powerful: Cloud Run handled everything we threw at it. Auto-scaling, parallel processing, WebSocket connections—it just worked. We spent zero time managing infrastructure and all our time building features.
AI Models Have Personalities: Each AI model has its strengths. Gemini is incredible at understanding context and relationships. Imagen creates beautiful visuals but needs specific prompts. Lyria's music is surprisingly good at matching moods. Learning how to work with each one was key.
Spatial Memory is Real: Testing this with actual study material proved the concept. I genuinely remember things better when they're placed in a 3D space. The ancient Greeks were onto something.
Iteration is Everything: Our first version had generic objects. Our second version had concept-specific objects but poor placement. Our third version had better spatial algorithms. Each iteration taught us something and made the experience better.
What's next for MindScape AI
Collaborative Learning Spaces: Imagine multiple students exploring the same memory palace together, discussing concepts in real-time. We could add multiplayer support where you see other learners as avatars in the space.
Personalized Environments: Let users upload their own photos to create memory palaces based on places they actually know—their house, their school, their favorite park. The familiarity would make the memories even stronger.
Spaced Repetition Integration: The memory palace gets you to remember initially, but spaced repetition helps you remember long-term. We could track which concepts you visit and remind you to revisit ones you haven't seen in a while.
Mobile AR Version: Using ARCore/ARKit, you could place your memory palace in your actual physical space. Walk around your room and see the concepts floating in front of you. That would be wild.
More Content Types: Right now it's PDFs. But what about YouTube videos, podcasts, web articles? Any content could become a memory palace.
Assessment and Progress Tracking: Add quizzes that happen inside the 3D space. "Walk to the concept that explains photosynthesis." Track which concepts users spend time with and which they skip.
Community-Created Palaces: Let teachers create memory palaces for their classes and share them. Students could explore pre-built palaces for common subjects like AP Biology or World History.
Better Concept Relationships: Right now we show relationships with lines. We could make them more meaningful—a bridge between related concepts, a locked door for prerequisites, color-coding for concept types.
Voice Navigation: "Take me to the concept about mitochondria." Natural language navigation would make exploration even more intuitive.
Built for the Google Cloud Run Hackathon 2025
Built With
- 3d
- ai
- education
- fastapi
- gemini
- imagen
- lyria
- machine-learning
- oogle-cloud-run
- python
- react
- serverless
- three.js
- vertex-ai
Log in or sign up for Devpost to join the conversation.