Inspiration

I like reading. I even like reading a real physical book more than listening to it as an audio book, but I'm often busy and I like listening to my books when I'm in the car. The problem comes when I've been reading for a bit and need to go back to my audio book. It's a pain trying to find my place.

What it does

I've built a website that can find your place in an audio book from just a photo of your last page. Towards this end I've also built tools for capturing the plain text from overdrive audio books

How I built it

There are two main tools that make up this process. First a mapping is generated using forced alignment between the plain text of the book and it's audio.

On the search end we start with a photo of the page. This is sent to a node server which uses tesseract.js to produce a transcript of the page. A line from that transcript is selected and found in the transcript of the audio book. Then using the alignment map from earlier the audio position is found.

Challenges I ran into

Getting data was one of the hardest things here. I ended up writing a Firefox Extension to capture the plain text of the ebook. Even then I was only able to get a preview with the first 4 chapters of the only physical book I had with me.

It also turned out that tesseract.js was very picky. I'd originally planned to run it on the client, but it crashed Firefox, and just didn't work in Chrome and running it in node was far from painless

Accomplishments that I'm proud of

I've honestly wanted to do this for a long time and for one person working for less than 24 hours (I got here late) I'm really happy with how far I got

What I learned

How to use tesseract.js and some of the various forced alignment libraries

What's next for Book Sync

More ease of use features. It needs to support users submitting the transcript and the audio files.

A better text search algorithm supporting a fuzzy search.

Built With

Share this project:

Updates