The goal of this project was to use machine learning to predict the performance of NBA players in future games based on their previous games' data. Specifically, we created models to predict player points, assists, rebounds, and turnovers. To accomplish this, we utilised a vast dataset that contained information about NBA games played between 2003 and 2021. We aim to target the sports betting market, as well as give important key indicators for NBA agents to evaluate their players against and to make more informed decisions. We began by pre-processing the data and cleaning it to remove any irrelevant information and outliers. We then used feature engineering to create new data points that would be useful in determining the performance of players. This includes amongst many others their recent performance in earlier games, their long term performance. Their current form, and whether they have been recently injured. Additionally we looked at the amount of rest time they have between games, as well as the distance they had to travel for the next game. Once the data was clean and pre-processed we had almost half a gigabyte of pure data, we used the commercial data science platform RapidMiner to quickly apply various Machine Learning algorithms. While we were able to train multiple linear regression models, the time constraint didn’t let us use a Boosted Decision Tree or Artificial Neural Net which we were hoping to use. Importantly, for these models we split our data into three: training-, testing- and holdout-data in a 70-20-10 ratio based on seasons. We fine-tuned the hyperparameters of the model to improve its performance, and it was able to achieve a strong correlation of and low root mean squared errors in in-sample, out-of-sample, and holdout testing. Given these strong results, there is clearly great potential for developing a tool both for fantasy basketball fans and sports betting enthusiasts that provides intelligent NBA insights. We hope to add additional models for the remaining major stat categories (e.g., blocks and steals), and we want to aggregate all the results of all the models in order to make a new model that predicts the outcome of entire games.
Log in or sign up for Devpost to join the conversation.