Inspiration & Goals of the Hack

We've all heard of the services that transcribe your lectures and meetings, even creating summaries for you. We wanted to add to this by creating video summaries that highlight the most important part, and allow you to query for certain parts of video content in your library.

How we built it

Our goal was to utilize the multimodal model architecture described in the HiREST paper; using separate encoders to encode the video, audio (as text), and query separately to find the portion of a lecture video that best fits the query. However, as we would need to finetune the model on a corpus of lecture videos prior, we were unable to implement this within the time of the hackathon, but would ideally try to build such a corpus if we were to continue this project past the hackathon. For text summarization, we extract the transcript of the lecture video and pass it through an LLM (Llama 13B) to list the key points of the model.

Challenges we ran into

We ran into challenges of using a pretrained model for this task (HiREST), where the it was evident that the data that it was trained on could not be used for our task (eg more instructional material, like recipe videos).

Accomplishments that we're proud of

We found a lack in research

What we learned

While we are not that familiar with LLMs and prompting, we got to dabble with that to make our approximation demo.

What's next for SnipStudy

Make the corpus of long form content such as lectures and meetings, and hopefully training a model to understand this type of data (and achieve the goals of this project).

Built With

Share this project:

Updates