Inspiration

Due to some incident in the close family member we spent a lot of time finding the "red car" that passed by them in the video. We thought there must be a way to analyze the video and extract the information we need.

What it does

Built a video analysis system that can analyze the video and extract the information we need.

How we built it

We used a combination of computer vision and natural language processing techniques to build the video analysis system. First we extracted the frames from the video and then we used a pre-trained "vision language model" to generate the description for each frame. Then we summarized the description for each frame to generate the summary of the video. After that we used a pre-trained "language model" to generate the entities from the summary of the video.

Challenges we ran into

1 Hardware constraint to run the vision model 2 Too many frames to process 3 Survillance video does not have audio and most of the time no movement in the video.

Accomplishments that we're proud of

1 learned how to build a video analysis system 2 learned how to use the vision language model 3 learned how to use the language model 4 learned how to use the computer vision model 5 Quantization of the model to run on the edge device

What we learned

1 learned how to build a video analysis system 2 learned how to use the vision language model 3 learned how to use the language model 4 learned how to use the computer vision model 5 learned how to quantize the model to run on the edge device

What's next for video analysis

1 Json response for the video analysis 2 face detection and recognition 3 motion detection and tracking along with direction

Built With

Share this project:

Updates