Inspiration
We wanted Double Vision to allow a continuous learning experience: watching video while simultaneously reading further up on topics of interest
What it does
Uses audio and image recognition along with Natural Language Processing to generate smart context (i.e. Wikipedia links to articles) while watching your video
How I built it
download video, extract frames using opencv, assign tags to video using tensorflow, download audio and transcribe to text using wit api, extract meaningful tags from text using Alchemy api, select most important tags (based on probability of accuracy) and display related Wikipedia articles along with accompanying Bing thumbnails
Challenges I ran into
Using many different api's
Accomplishments that I'm proud of
Being able to leverage both audio and image content to generate meaningful results
What I learned
To be patient and persistent
What's next for DOUBLEVISION
optimize running time, perhaps transform DOUBLEVISION into a Chrome extension, develop an annotating community around videos and develop a more interactive UI


Log in or sign up for Devpost to join the conversation.