Inspiration

Whenever I watch documentaries, shows, or movies, sometimes I don't understand parts, or I just want to ask a question about the series. Google's AI overview is the closest I have, but still it gives inaccurate information, or information that is too general.

What it does

It can answer questions about the current video being played, including those about specific time segments.

We also have a summary feature, which uses both image data and dialog from the surrounding 30 seconds to summarize a scene (1 minute segment)

How we built it

I used Flask for the backend, and plain html/css/js for the extension. I used an injection approach that injects the modal onto the site instead, so you can watch while asking away

Challenges we ran into

There were many issues with dependencies, and even more with youtube apis

Accomplishments that we're proud of

First time using openai vision api, and the quickest (1 day?) i've made an app

What's next for VidBud

Making it support more sites outside of youtube

Share this project:

Updates