OpenCV Project – Multi-Person Pose Estimation and Tracker
Machine Learning courses with 100+ Real-time projects Start Now!!
Welcome to the future of computer vision and human motion analysis. In this project, we are going to develop a Real-time Multi-Person Pose Estimation and Tracking System.
These cutting-edge technologies have revolutionized our ability to understand and interact with the complex movements of multiple individuals simultaneously. This system opens up a world of possibilities in various domains, from sports and healthcare to entertainment and beyond.
Movenet model
The MoveNet model is like a smart computer program that can look at videos of people moving and figure out how their bodies are positioned and moving in real-time. It’s like having a virtual assistant that can track multiple people at once, making it really useful for things like sports analysis, fitness tracking, and even special effects in movies. MoveNet model is a key player in the world of Real-time Multi-Person Pose Estimation And Tracking Systems, helping us understand and interact with human movement like never before.
Prerequisites For Multi-Person Pose Estimation Using OpenCV
Solid knowledge of Python programming and TensorFlow, opencv library. Apart from this following system configuration is needed.
- Python 3.7 (64-bit) and above
- Any Python editor (VS code, Pycharm)
- Graphics (Min 4 GB for greater FPS)
Download OpenCV Multi-Person Pose Estimation Project
Please download the source code of OpenCV Multi-Person Pose Estimation Project: OpenCV Multi-Person Pose Estimation Project Code.
Installation
Open windows cmd as administrator
1. Run the following command from the cmd.
pip install opencv-python
2. To install tensorflow library run the following command from the cmd.
pip install tensorflow
3. To install the tensorflow_hub library run the command from the cmd.
pip install tensorflow_hub
Let’s Implement
1. Import all the packages.
import cv2 import tensorflow as tf from matplotlib import pyplot as plt import numpy as np import tensorflow_hub as hub
2. This function takes the frame, set of keypoints and confidence threshold as input parameters. It is used to draw the circles on the input frame at the position of keypoints that have confidence higher than the specified threshold.
def keypoints(frame, keypoints, threshold):
y, x, c = frame.shape
Shape = np.squeeze(np.multiply(keypoints, [y,x,1]))
for key_point in Shape:
ky, kx, key_point_confidence = key_point
if key_point_confidence > threshold:
cv2.circle(frame, (int(kx), int(ky)), 7, (255,0,0), -1)3. It defines a dictionary called edges that represents connections between body parts in a pose estimation system.
EDGES = {(0, 1): 'm',(0, 2): 'c',(1, 3): 'm',(2, 4): 'c',(0, 5): 'm',(0, 6): 'c',(5, 7): 'm',
(7, 9): 'm',(6, 8): 'c',(8, 10): 'c',(5, 6): 'y',(5, 11): 'm',(6, 12): 'c', (11, 12): 'y',
(11, 13): 'm', (13, 15): 'm',(12, 14): 'c',(14, 16): 'c'
}4. This function uses keypoints and a threshold confidence to draw green lines connecting specified body parts in an input frame, helping visualize pose connection when confidence is high.
def make_connections(frame, keypoints, edges, threshold):
y, x, c = frame.shape
Shape = np.squeeze(np.multiply(keypoints, [y,x,1]))
for edge, color in edges.items():
p1, p2 = edge
y_1, x_1, conf_1 = Shape[p1]
y_2, x_2, conf_2 = Shape[p2]
if (threshold < conf_1 ) & (threshold < conf_2):
cv2.line(frame, (int(x_1), int(y_1)), (int(x_2), int(y_2)), (0,255,0), 4)5. This function processes a frame with keypoints and scores, then it makes connections marks the keypoints and enables pose visualization in the frame based on set threshold.
def detect_people(frame, score_of_keypoints, edges, threshold):
for person in score_of_keypoints:
make_connections(frame, person, edges, threshold)
keypoints(frame, person, threshold)6. It loads the MoveNet model for multi-person pose estimation using tensorflow hub. It specifies the model’s serving signature as ‘serving_default’, initializes a video capture from camera by specifying an index of 0.
Model = hub.load('https://tfhub.dev/google/movenet/multipose/lightning/1')
Movenet_Model = Model.signatures['serving_default']
cap = cv2.VideoCapture(0)7. It continuously captures video frames from a camera, processes each frame using MoveNet model and visualize the result. It resizes the frames extracts pose keypoints and calls the detect_people() function to annotate and display frames in real-time. The loop exits when the ‘q’ key is pressed it closes all the windows and releases all the hardware resources.
while cap.isOpened():
ret, frame = cap.read()
img = frame.copy()
img = tf.image.resize_with_pad(tf.expand_dims(img, axis=0), 384,640)
input_img = tf.cast(img, dtype=tf.int32)
results = Movenet_Model(input_img)
score_of_keypoints = results['output_0'].numpy()[:,:,:51].reshape((6,17,3))
detect_people(frame, score_of_keypoints, EDGES, 0.1)
cv2.imshow('DataFlair', frame)
if cv2.waitKey(10) & 0xFF==ord('q'):
break
cap.release()
cv2.destroyAllWindowsOpenCV Multi-Person Pose Estimation Output
Conclusion
In conclusion, the OpenCV Real-time Multi-Person Pose Estimation and Tracking System using MoveNet is a significant advancement in computer vision. Its lightweight, accurate design has diverse applications, from sports analysis to security. While challenges remain, it holds promise for revolutionizing industries and enhancing daily life, highlighting the importance of computer vision and deep learning in our future.
Did we exceed your expectations?
If Yes, share your valuable feedback on Google

