GYMPOSE - CHAP | Devpost

Pose Estimation Model
Details Page
Logo

Inspiration

Due to the great detrimental effects imposed on individuals as a result of the COVID19 pandemic, fitness has become increasingly popular as it provides an effective way to repair and heal the physical and mental health burdens caused by the lockdowns, isolation, and fear. However, learning new exercises that target different muscle groups of the body could be a daunting and time-consuming task, subjecting many beginners to a barrier of entry. As a result, our dedicated team has invented “GYMPOSE” in order to facilitate one’s fitness journey by making it less intimidating, thereby providing an effective means to better one’s overall well-being.

What it does

GYMPOSE is an AI web application that is able to classify an exercise based on video recordings. Using our ingenious platform, individuals can record themselves, or others, doing a specific exercise and our AI technology in a matter of seconds will provide them with the name, pros and cons, alternatives, and tutorials for that particular exercise. Instead of spending hours browsing the internet to research exercises, by simply recording the exercise you wish to learn about, GYMPOSE will efficiently bring forth all the necessary information to your disposal.

How we built it

We had to create a multi-classification model. This required us to begin by determining the specific features necessary for training the model. Initially, we hypothesized that the best way to approach this was by implementing the base of our features from the pose estimation coordination given by opencv python. The pose estimation model consists of more than 31 points on the body, which we rapidly concluded were unnecessary and exaggerated for our circumstances. We therefore reduced these points to 13 crucial locations on the body by excluding points for the face, fingers, and feet. The model we initially trained is a Sequential Neural Network, in which the initial features we fed it with were the starting position of each and every exercise. Unfortunately, this demonstrated itself to be quite inefficient and problematic since certain exercises were quite similar in the 2d perspective such as the Shoulder Press and Bench Press. Consequently, this led to a 20-30% accuracy score for this model.

We then decided to include another list of features which took into consideration the end point of the exercise (end of the pulse of a movement). This increased the accuracy to 60-70%. The final feature we added to our Sequential Neural Network was the curve of best fit for the movement of the exercise. After examining several curves, we determined that a polynomial of six degrees gave the best results, since lower or higher would result in underfitting or overfitting. This translated into an average accuracy of 75%-85% for this model. Furthermore, we made sure to have a balanced dataset. There were certain exercise videos that did not include the full body, causing imbalance, and for this reason we decided to use Opencv pose estimation’s ability to estimate other parts of the body despite their absence.

Even though we had a working model with the Sequential Neural Network, we were determined to find improvements on our implementation. Upon research, we learnt about the combination of the CNN + LSTM models which is utilised extensively in most activity recognition softwares. CNN stands for Convolutional Neural Network, a type of deep neural network that has been designed to work with images. Filters (also known as kernels) are added over the image being analysed, thereafter a feature map is generated. CNN works by adding multiple layers of filters on the image, altogether analysing all the features of an image. Furthermore, the LSTM model works with data where time is of the essence. LSTM is a model that works with sequential data, where the previous data inputs are considered while the current output is generated. Therefore, this is a suitable model for an activity recognition program as videos contain temporal information as well. In combination, the CNN model works to identify the spatial data of a video while LSTM then builds on the CNN to leverage the temporal change in the video frames.

Since this was the first time our team had encountered the CNN + LSTM model, we ran into challenges while implementing it during the hackathon. Due to this being the first time we were working with the CNN+LSTM model, Our model was not able to learn all the features of a video properly. The model’s accuracy was only 37.5% and took much longer to make a prediction. In the future, with a more lenient timeline and more experience working with this model, we wish to fully implement it for GymPose. Given the time constraint around our task, we were only capable of recording 30 videos for the 6 exercises mentioned earlier. Although our work has shown promising results, having access to more datasets and videos of different exercises will not only allow us to classify a greater range of exercises but also significantly increase the accuracy of our model.

Challenges we ran into

Implementing a react frontend with a Python Backend
Implementing API Calls for Live Video
Displaying a live pose-estimated converted Video
Implementing a CNN+LSTM Model in less than a day

Accomplishments that we're proud of

Creating a Beautiful UI complementing the AI
Implementing a 80% score in the model with little data
Avoiding imbalanced datasets

What we learned

We were introduced to different models and their benefits in specific scenarios.
How to create an API with flask
CNN + LSTM Model

What's next for GYMPOSE

We would like to fully implement a working CNN+LSTM Model to hopefully increase the accuracy of the overall AI. We would also want the option to upload videos of exercises and this will in consequence retrain the model with this exercise too. To hopefully increase the data in the model.

Built With

Submitted to

MAIS Hacks 2022
- Winner People’s Choice

Created by

I worked on implementing the Sequential Models, Data Filtering and the Flask API.

Athavan Thambimuthu
I worked on implementing the CNN + LSTM model for activity recognition in multiple exercise videos.

Abhijeet Praveen
I worked on the front-end.
It was my first time using
React, which was a little intimidating, but I learned a lot.

Abhigyan Praveen
Ahmed Chaudhry