Think Visually, Explore Infinitely
An AI-powered educational platform that transforms curiosity into interactive, personalized learning journeys through dynamically generated video content.
VideoGraph reimagines how people learn complex topicsโnot through static textbooks or pre-recorded lectures, but as interactive journeys that adapt to each person's understanding in real-time. We wanted to empower learners to explore knowledge organically, giving them the freedom to ask questions, dive deeper, and have AI create personalized explanations exactly when they need them.
Our goal: educational videos that respond, adapt, and grow with you, creating a truly infinite learning experience that feels as natural as a conversation with a teacher.
- Generate fully animated educational videos from simple text prompts
- Manim-powered animations with graphs, vectors, molecules, and more
- AI-generated voiceovers perfectly synchronized with visuals
- Real-time rendering progress via Server-Sent Events
- Each video becomes a node in your personal knowledge graph
- Ask follow-up questions to create new branches
- Navigate through your learning journey visually
- Infinite explorationโnever hit a dead end
- Context-aware generation that remembers what you've learned
- Difficulty adaptation based on your performance
- Personalized remediation when you need extra help
- Smart branching logic (questions โ children, topics โ siblings)
- Find concepts by meaning, not just keywords
- Vector embedding-powered search (<50ms response time)
- Search "how graphs work" and find "Introduction to Trees"
- Contextual quizzes at the end of learning branches
- Instant feedback and comprehension evaluation
- Automatic remediation videos for incorrect answers
- Explore pre-generated learning paths
- Example topics: Pythagoras Theorem, Photosynthesis, and more
- Visual thumbnails and AI-generated titles for easy navigation
- React + TypeScript + Vite
- @xyflow/react for interactive tree visualization
- TailwindCSS for styling
- Remotion for video composition
- FastAPI on Modal (serverless compute)
- Python for orchestration and processing
- Manim Community Edition for animation rendering
- Google Cloud Storage for asset caching
- Google Gemini for content generation and code synthesis
- Cerebras for fast simple calls
- Grok (xAI) for auxiliary tasks
- Sentence Transformers for semantic embeddings
- PyTorch for ML operations
- ElevenLabs for text-to-speech
- Docker for containerization
- Real-time streaming via SSE (Server-Sent Events)
- Node.js (v18+)
- Python (3.10+)
- Modal account (modal.com)
- API keys for:
- Google Gemini
- ElevenLabs
- Google Cloud Storage
# Clone the repository
git clone https://github.com/yourusername/videograph.git
cd videograph
# Install frontend dependencies
cd frontend
npm install
# Set up environment variables
cp env.template .env
# Edit .env with your API keys
# Start the development server
npm run dev
# In a separate terminal, deploy the backend
cd ../backend/modal
modal deploy main_video_generator.pyFor detailed setup instructions, see:
Type any question or topic you're curious about.
- Gemini creates a structured lesson plan
- Generates custom Manim animation code
- ElevenLabs synthesizes synchronized voiceover
- Multiple sections rendered in parallel on Modal
- View kinetic, video-essay style animations
- See your learning tree grow in real-time
- Click any node to revisit previous concepts
- Ask follow-up questions at any point
- Take quizzes to test your understanding
- Get personalized help when you need it
- Use semantic search to find related concepts
- Browse cached sessions for inspiration
- Export or share your learning journey
videograph/
โโโ frontend/ # React + TypeScript UI
โ โโโ src/
โ โ โโโ components/ # UI components (tree, overlays, etc.)
โ โ โโโ controllers/ # Video controller logic
โ โ โโโ services/ # API clients
โ โ โโโ remotion/ # Video rendering components
โ โโโ public/ # Static assets
โโโ backend/
โ โโโ modal/ # FastAPI + Modal backend
โ โโโ dev/ # Core logic modules
โ โ โโโ api_logic.py
โ โ โโโ generator_logic.py
โ โ โโโ reflection_logic.py
โ โโโ services/ # Shared services
โ โโโ llm/ # LLM provider abstractions
โ โโโ tts/ # Text-to-speech services
โ โโโ embeddings.py
โโโ scripts/ # Utility scripts
โโโ docs/ # Documentation
Every user interaction is mapped to a tree structure:
- Child nodes: Follow-up questions that dive deeper
- Sibling nodes: Related topics at the same level
- Persistent state: Tree survives page refreshes
- Concurrent generation: Ask multiple questions while videos render
See CACHED_SESSIONS_GUIDE.md for details.
AI-generated Manim code can fail unpredictably. Our healing system:
- Captures error traces and render failures
- Analyzes what went wrong (syntax, logic, assets)
- Repairs by sending context back to the LLM
- Retries up to N times with improved prompts
- Logs all failures for continuous improvement
See MULTI_VARIANT_RETRY_SYSTEM.md for implementation.
- ElevenLabs generates natural-sounding narration
- Timing alignment syncs voice with visual transitions
- Dynamic voice selection based on content type
- Streamed in real-time during rendering
See TTS_PIPELINE.md and DYNAMIC_VOICE_ID_IMPLEMENTATION.md.
- Sentence transformers convert text to 384D vectors
- Cosine similarity finds conceptually related nodes
- Embeddings cached for instant retrieval
- Searches entire tree in <50ms
See SEMANTIC_SEARCH_IMPLEMENTATION.md.
- โ Unified Learning Experience: Integrated AI, physics simulations, and video synthesis into one cohesive app
- โ True Infinite Learning: Users never hit a wallโask any question, get a video
- โ Production-Ready AI Code: Robust validation and healing makes LLM-generated code viable
- โ Sub-50ms Semantic Search: Vector embeddings enable intuitive concept discovery
- โ Serverless at Scale: Modal handles parallel rendering and bursty workloads efficiently
Building a procedurally-generated, infinite tree required:
- State machine in
VideoController.tsxfor atomic tree updates - Queue system for concurrent video generation
- React Context for global state synchronization
- Custom React Flow rendering for visual polish
Letting AI generate arbitrary Manim code meant handling:
- Unpredictable compilation and runtime errors
- Automated triage and stack trace analysis
- Self-repair via LLM feedback loops (up to N retries)
- Partial rendering for graceful degradation
Making AI content reliable at scale involved:
- Structured output schemas (JSON) for predictable responses
- Multi-model orchestration (Gemini, Cerebras, Grok)
- Parallel rendering pipelines on Modal
- Asset caching in Google Cloud Storage
- ๐ค Speech-to-text input for hands-free learning
- ๐ฌ Real-time dialogue with AI tutor
- ๐ฃ๏ธ Voice cloning for personalized narration
- ๐ Public tree library for sharing knowledge graphs
- ๐ Tree templates curated by experts
- ๐ Export to PDF, Notion, or shareable links
- ๐ Embedding API for third-party platforms
- ๐ซ LMS integration (Canvas, Moodle, Blackboard)
- ๐ Analytics dashboard for teachers
- ๐ Curriculum alignment to learning standards
- ๐ Bulk generation for entire courses
- Quick Start Guide
- Frontend Architecture
- Backend Architecture
- Cached Sessions Guide
- Quiz Feature
- Thumbnail & Title Generation
- Voice Selection
- Optimization Summary
- Google Gemini for powerful content generation
- Manim Community for animation framework
- Modal for serverless infrastructure
- ElevenLabs for natural voice synthesis
- HackPrinceton for the opportunity to build and showcase