StoryVue | Devpost

StoryVue
Main Page
System Design

Inspiration

Vision impairment impacts so many lives. In the U.S. alone, 7 million people live with vision impairment, including 1 million who are completely blind. Among kids, about 6.8% under 18 have an eye condition, and 3% are blind or visually impaired even with glasses or contacts.

At the same time, research shows that children who are read to regularly hear 290,000 more words by the time they start kindergarten than those who aren’t. It’s also linked to better school performance, improved mental health, and less time spent glued to screens.

StoryVue is our way of making sure every child has a chance to explore books, learn, and grow.

What it does

StoryVue is a voice-powered reading assistant that helps visually impaired children read independently.

Using OCR (Tesseract) and OpenAI’s language model, the app reads printed text out loud and can even summarize or explain what’s on the page. It’s completely hands-free, so kids can interact using just their voice with no need to press buttons or see the screen.

Key Features

Real-time OCR (Optical Character Recognition) to capture text instantly
AI summaries to help kids understand complex content
Speech-to-Text (STT) for full voice control
Text-to-Speech (TTS) for natural, easy listening
Node.js + Express.js backend to keep everything running smoothly
TensorFlow to build smart, adaptive features
LiveKit API for future collaborative reading sessions

How we built it

We started by combining real-time OCR (Optical Character Recognition) and AI language processing with voice technology to create a smooth, fully accessible experience.

TensorFlow – Used to train a model that detects whether a book is in the camera frame and provides a confidence rating before running OCR.
Camera Capture – Streams live video so we can identify the book in real-time and send clean frames to the OCR system.
Tesseract.js OCR – Extracts text from the captured images quickly and efficiently.
Text Extraction Pipeline – Cleans and organizes the scanned text before sending it to the backend.
Node.js + Express.js Backend – Acts as the core hub, connecting all services and managing requests.
OpenAI GPT-3.5 API – Generates natural, conversational reading experiences, explains tricky concepts, and creates summaries for better understanding.
LiveKit API – Handles for our real-time audio and video streaming platform.
HTML + CSS – Creates our accessible interface, giving kids the reading experience without needing to see or touch the screen.

System Design

Challenges we ran into

Speech-to-Text worked great early on, but TTS gave us a lot of trouble and didn’t always sound natural or consistent.
Tesseract would work one moment and completely crash the next due to outdated libraries and git pull/push requests for the team.
Getting all our libraries and services to play nicely together was a challenge in itself.

Accomplishments that we're proud of

Integrating OpenAI’s LLM to make reading interactive and dynamic!
Achieving real-time OCR!
Building voice controls that make the app hands-free!
With these achievements, we’ve made our most comprehensive app to date while also tackling an issue we are passionate about!!

What we learned

User-first design when creating for groups with disabilities.
How to troubleshoot and stabilize open-source libraries for real-time performance.
How tricky it is to combine multiple advanced tools (OCR, AI, voice tech) into one smooth experience.

What's next for StoryVue

Collaborative Reading: Allow multiple users to read together remotely using LiveKit.
Parent/Teacher Portal: Track progress, identify challenging words, and suggest follow-up learning activities.
New Subjects: Read and interpret math equations & desribe images and diagrams for history and science so kids don’t miss out on visual learning.