Inspiration
Whenever I watch documentaries, shows, or movies, sometimes I don't understand parts, or I just want to ask a question about the series. Google's AI overview is the closest I have, but still it gives inaccurate information, or information that is too general.
What it does
It can answer questions about the current video being played, including those about specific time segments.
We also have a summary feature, which uses both image data and dialog from the surrounding 30 seconds to summarize a scene (1 minute segment)
How we built it
I used Flask for the backend, and plain html/css/js for the extension. I used an injection approach that injects the modal onto the site instead, so you can watch while asking away
Challenges we ran into
There were many issues with dependencies, and even more with youtube apis
Accomplishments that we're proud of
First time using openai vision api, and the quickest (1 day?) i've made an app
What's next for VidBud
Making it support more sites outside of youtube
Log in or sign up for Devpost to join the conversation.