Inspiration
Due to some incident in the close family member we spent a lot of time finding the "red car" that passed by them in the video. We thought there must be a way to analyze the video and extract the information we need.
What it does
Built a video analysis system that can analyze the video and extract the information we need.
How we built it
We used a combination of computer vision and natural language processing techniques to build the video analysis system. First we extracted the frames from the video and then we used a pre-trained "vision language model" to generate the description for each frame. Then we summarized the description for each frame to generate the summary of the video. After that we used a pre-trained "language model" to generate the entities from the summary of the video.
Challenges we ran into
1 Hardware constraint to run the vision model 2 Too many frames to process 3 Survillance video does not have audio and most of the time no movement in the video.
Accomplishments that we're proud of
1 learned how to build a video analysis system 2 learned how to use the vision language model 3 learned how to use the language model 4 learned how to use the computer vision model 5 Quantization of the model to run on the edge device
What we learned
1 learned how to build a video analysis system 2 learned how to use the vision language model 3 learned how to use the language model 4 learned how to use the computer vision model 5 learned how to quantize the model to run on the edge device
What's next for video analysis
1 Json response for the video analysis 2 face detection and recognition 3 motion detection and tracking along with direction
Log in or sign up for Devpost to join the conversation.