Inspiration

Imagine it’s the week before finals at your school, and on top of your already busy class schedule, you feel behind on your studying. Walking from class to class to dorm to dining hall and back to class might give you a chance to destress and reflect, but past a certain point, you might feel your 45 minutes spent walking every day could be used to prepare for those tests you’re stressing for. That inspired Notesify, an app we designed to help students get ahead of the anxiety of feeling not prepared by providing enhanced, audio forms of their notes. With this app, we aim to provide students with a low-stress studying option while on the go.

What it does

Our platform provides AI-powered audio generation of a user’s notes. Users can upload notes in any file format (LaTEX, PDF, DOXC, JPEG), and the platform generates a customized lecture-styled summary of the notes. These mini-lectures can be adjusted to any desired listening period to accommodate for how much time the user has to listen. Notesify also supports uploading images of handwritten notes, and utilizes various AI models and transformers to enhance and condense your notes by identifying and emphasizing key information points and adding explanatory details. Your fifteen minute walk will no longer be spent suffering in confusion; instead, each minute will be utilized calmly reviewing the material you learned in class.

How we built it

Frontend: Developed using the ReactJS framework and Chakra UI to structure our app and integrate with our backend. Backend: Developed using Python and deployed onto the Flask REST API framework. We implemented the DONUT OCR (Optical Character Recognition) image processing model to convert images to raw text. For text modification, we implemented OpenAI’s GPT 4.0 API to convert raw notes text to lecture-formatted text. Then, we applied Meta’s BART transformer model and noising function to summarize and condense the script to a desired length. Database: MongoDB Atlas was pivotal to our app’s function, being used to store our raw uploaded notes, and their corresponding lecture script text files, and audio files.

Challenges we ran into

TrOCR vs DONUT OCR: We initially used the TrOCR (Transformer-based Optical Character Recognition) to parse handwritten and printed text images to raw text. However, this system was slow and optimzied for single-line text parsing only. Hence, we switched to the DONUT OCR which provided overall document understanding and multi-line text parsing. The slight tradeoff in character conversion accuracy for a significant boost in speed made this switch an overall improvement.

BART API Data Overload: With lecture scripts being 10,000’s of words, the BART API struggled to summarize and condense our GPT-genereated texts. Hence, we wrote a script to section our input into overlapping chunks and feed it into BART. This prevented the API from erring due to handling too much data while maintaining the context of previous sentences of data, improving overall accuracy.

Accomplishments that we're proud of

We were proud to have successfully implemented various LLM and transformer models despite not being too experienced with them. From OpenAI’s GPT API, to Meta’s BART summarizer, we learned how LLMs can be properly applied to solve intricate language-related challenges, and have developed skills that we can apply to future AI-related projects.

Beyond the technical aspects of this project, we’re especially proud to have applied our own experiences and skills to address this feeling of confusion after lectures shared by all college students, us included. By translating anyone’s notes to a coherent, individualized learning style, we make using peers’ notes to study also possible, encouraging collaboration.

What we learned

Throughout this project, we learned much about new technologies especially related to computer vision and AI. Working with the TrOCR and DONUT OCR models taught us much about how various character recognition models differ, and how to properly preprocess our data for optimal model performance. Additionally, we learned how to extend LLMs beyond text generation and modification and to producing audio output through the use of the Google Cloud TTS API. Now, we understand how to fine-tune models for generating natural-sounding text and speech, such as through experimenting with different prompt structures. Lastly, integrating each of these APIs and technologies into one cohesive React web-application was challenging but taught us how to remain organized in large-scale implementations.

What's next for Notesify

Our goal for Notesify is to develop into a platform like Spotify for lecture notes. Students will be able to create private or public lecture playlists, share their favorite renditions with friends, and follow their favorite note-takers to stay updated on the subject. Lecture recordings will be recommended on past and popular topics tailored to each student and can be searched for based on topic and length. Eventually, our goal is to create a platform for learning uninhibited by physical location.

Share this project:

Updates