Inspiration
I always wanted to read books because I had heard from others that they are deeply engaging and immersive. But the idea of staring at plain pages filled with text, without visuals or interaction, often made me feel like I would get bored. That hesitation sparked the idea of Book Morph — a way to transform static book pages into something dynamic, visual, and alive.
Beyond engagement, we also wanted to solve real-world problems. Carrying physical books everywhere is not always convenient. Audiobooks can be expensive. And for people with vision disabilities, access to affordable reading alternatives is limited. Book Morph aims to make stories more accessible, portable, and immersive for everyone.
What it does
Book Morph converts an image of any book page into a fully interactive audiobook-style experience.
It extracts text from the image, intelligently analyzes it, identifies scenes, characters, and dialogues using geminiApi, generates relevant visuals for each scene, and produces realistic voice narration. Users can switch between scenes and dialogues, and play or pause the narration, creating a seamless audio-visual storytelling experience.
Impact & Vision
Potential Impact
Book Morph has the potential to redefine how people consume written content. By turning static text into a visual and auditory experience, it bridges the gap between traditional books and modern digital storytelling. It can make reading more engaging for people who struggle with long text formats and provide an affordable alternative to premium audiobooks.
Who It Is For
- Students and readers who prefer interactive learning
- People with vision disabilities who need audio-based access
- Busy individuals who want to listen to books on the go
- Parents who want immersive storytelling for children
- Content creators exploring new storytelling formats
Real-World Evolution
In the future, Book Morph could evolve into:
- A full digital storytelling platform for publishers
- An accessibility tool integrated into educational institutions
- A mobile app for instant page-to-audio conversion
- A tool for converting old printed books into digital audiobooks
- A creator platform where authors can generate cinematic versions of their stories
How we built it
First, we use Tesseract OCR to extract raw text from uploaded book page images. Then, instead of simply reading the text aloud, we use AI to process the extracted content and intelligently:
- Detect and separate scenes
- Identify characters
- Extract dialogues
- Structure narration into organized JSON format
This structured understanding allows us to treat the book like a screenplay rather than plain text.
For visuals, we generate scene-based search queries and fetch relevant images using SerpAPI. For narration, we integrate ElevenLabs Text-to-Speech API to produce realistic and expressive audio for both dialogues and scene descriptions.
Challenges we ran into
One major challenge was handling imperfect OCR output, especially when images were slightly blurred or misaligned. Another significant difficulty was getting the LLM to consistently return accurate and properly structured JSON responses for scenes, characters, and dialogues. Sometimes the model would add extra text, formatting, or invalid JSON, which required careful prompt engineering, validation logic, and response sanitization.
Accomplishments that we're proud of
successfully transformed static printed pages into an engaging multimedia experience. The seamless integration of OCR, AI text analysis, image generation, and voice narration into one cohesive workflow was a major achievement.
What we learned
We learned about OCR and text extraction,and also that Prompt design plays a crucial role in extracting structured data like scenes and dialogues from raw text. We also understood the importance of validating AI-generated JSON before using it in production systems.
What's next for BOOK MORPH
Next, we aim to enhance emotional expression in narration, assign unique voices automatically to different characters, and improve scene visualization beyond static images.
We also plan to introduce multi-language support, better accessibility features, and the ability to export complete generated audiobooks.
Book Morph is just the beginning of reimagining how stories can be experienced — not just read, but seen and heard.
Log in or sign up for Devpost to join the conversation.