Inspiration
We were inspired by the MLH challenge to create a tool for education using AI. As students that have know the struggles of studying and keeping up with class we recognized that reading handwritten notes was consistently a struggle with some of our professors. As such we decided to use Google Clouds API to use create an optical character recognition application.
What it does
Our project is hosted on a website where it takes in a lecture slide file pdf and scans the slides and outputs a shorter pdf containing the content of the slides without the obfuscation of the authors handwriting.
How we built it
We first explored options with TensorFlow but decided to give Google Cloud a try due to its impressive accuracy and very compelling API simplicity. We then created a website with Bootstrap with the standard webdev tools of HTML, CSS and Javascript. We then hosted it on a webserver with flask where it could communicate with the OCR component of our project. This allow the user can input a file through the website and then a processed pdf will be returned. All of the image reading and processing was done through Python and the Google Vision AI API.
Challenges we ran into
One bad issue was formatting as due to the uses of blocks when reading images with Google vision AI, correctly organizing sentences and notes from slides was difficult. Another issue we ran into was lack of accuracy with some models as it was very difficult to find a model that could read alphanumeric and mathematical lexicons accurately. While Google Clouds selection is not perfect we found it to be the best we could use through testing.
What we learned
This was our first time attempting to work with a machine learning and OCR API so we learned a lot about training and loss, the strengths and weaknesses of ML/AI and also learned a lot about how to combine different tools to make a user friendly application.
What's next for Note Ninja
Some room for extension for Note Ninja is that as we learn more about ML we hope to be able to increase the success rate of parsing notes from the lecture slides and possibly create a more streamlined and polished final pdf output. Another area for expansion is to allow a more diverse range of inputs such as a chalkboard image.
Built With
- bootstrap
- css
- flask
- google-ai
- google-cloud
- html
- javascript
- pillow
- python
Log in or sign up for Devpost to join the conversation.