LightGBM: Predicting Titanic survivors with Gradient Boosting

Fredrik Olsson — Thu, 12 Sep 2019 09:11:49 GMT

2019–08–02

Fredrik Olsson

One of the most powerful techniques for building predictive Machine Learning models is the Gradient Boosting Machine. Gradient boosting is widely used in both industry and by Machine Learning competition winners, and the method can be used for a lot of different problems like regression, binary classification and multi-class classification. In this post, we will turn our focus to gradient boosting and try to get an understanding of what this algorithm does and how it works, as well as applying gradient boosting to the Titanic disaster dataset, using the gradient boosting framework LightGBM.

Titanic disaster dataset

The Titanic disaster dataset contain information about the passengers on board the Titanic and whether they survived or not. More precisely, the dataset contain the following information for every passenger:

Survived: Whether the passenger survived or not
Sex: The sex of the passenger
Age: The age of the passenger
Fare: How much the passenger payed for the ticket
Ticket class: Whether the ticket was a 1st, 2nd or 3rd class ticket
Family size: How many family members the passenger had on board the ship
Title: Whether the title of the passenger was Mr, Mrs, Master or Miss (indicating if the passenger was married or not)
Embarked: Which of the three places the passenger embarked on the ship

Age, Fare and Family size are numerical features, while Survived and Sex are binary features (False/True, 0/1). Ticket class, Title and Embarked are all categorical features, i.e. they can take a fixed number of values (larger than two). To represent these categorical features, we use one-hot encoding. This means that each categorical feature are split in one binary feature for every category. For example, if a passenger bought a 2nd class ticket, this will be represented in the following way:

In total, this dataset contain 892 data points (892 passengers), and the dataset has been split in a training set of size 792 that will be used to train the model and a test set of size 100 that we will use to test the performance of the model.

Our target is the Survival variable. Since this is a binary variable, the problem we have here is a binary classification problem — we want to predict if a passenger survived the Titanic disaster or not.

Baseline model

To have something to compare our gradient boosting model’s performance with, we start by performing a standard method for binary classification, namely logistic regression. As an evaluation metric, we will use weighted F1-score. The F1-score is based on precision and recall, and can for each class be computed as:

The weighted F1-score is a weighted sum of the F1-scores for each class:

where W₀ and W₁ are the proportion of data points belonging to class 0 and class 1 respectively, in the test set.

Training a logistic regression model on the training set, and then evaluating the model on the test set gives us the following result:

We get the following metrics:

and since there are 64 data points in class 0 and 36 data points in class 1, in the test set, we get the following weighted F1-score for the logistic regression model:

This result is already really good, but hopefully we will be able to beat it with our gradient boosting model.

Gradient boosting

Before we move on to implementing a gradient boosting model on the Titanic disaster dataset, we start by explaining what gradient boosting does and how it works.

Gradient boosting uses an ensemble of decision tree learners. Now, if you have heard of the Random Forest algorithm, the concept of using an ensemble of decision tree learners may sound familiar, since random forests uses this as well. There is however, a big difference compared to gradient boosting in how these tree learners are constructed. In a random forest, each tree is created independently of the others and the models weigh their respective result together equally. In gradient boosting, we let each new tree be based on the prediction of the previous trees, so that they can learn from the mistakes the previous trees made. This is the fundamental idea in boosting: converting weak learners into strong learners.

By a weak learner we mean a model whose performance is at least slightly better than random chance. In a random forest each tree is created independently, so these trees are all strong learners by themselves, but in gradient boosting we want room for improvement so they can learn from each other. To make sure that a decision tree is indeed a weak learner, we can limit its depth, number of leaves and set a minimal number of data points that need to be in a leaf.

LightGBM model

Hopefully, we have now got a pretty good understanding of what the gradient boosting method does. Now it is time to implement a gradient boosting model on the Titanic disaster dataset. There are several frameworks one can use, and we will use LightGBM, which is a gradient boosting method developed by Microsoft, that is implemented with several adjustments to improve things like time and memory efficiency, accuracy and parallel learning.

For many machine learning algorithms, there are only a few parameters and the default settings often works the best, or at least very well. This makes these algorithms easy to implement. For example, this is the case for our logistic regression model. In gradient boosting however, we have a lot of different parameters that we can specify and the default settings are almost never the best solution. To do the parameter tuning, one often does a cross-validation analysis using e.g. grid search, that is we train the model on a lot of combinations of different parameter values, and see which performs best on a cross-validation set according to some evaluation metric.

After plenty of parameter tuning, we got the following result by applying our LightGBM model on the Titanic disaster dataset:

and calculating precision, recall and F1-score for each class in the same way as we did for the logistic regression model, we get the following weighted F1-score:

which is slightly better than the weighted F1-score of 0.88 we got with logistic regression.

Custom loss function

So, we were able to improve the results compared to the baseline model. Per- haps we can do even better. Looking at the predictions from the LightGBM model, we see that there are rather many false negatives compared to false positives. Maybe we can improve the results with a loss function that punishes false negatives more than false positives. As you might have noticed, the gradient boosting algorithm is not expressed in a specific loss function, but just as a general loss function L. This makes it possible for gradient boosting to tackle several different problems like regression, binary classification and multi-class classification, but also enables us to chose our own loss function. Since we need the gradient of the loss function, we just need to make sure that our loss function is differentiable.

https://medium.com/media/969d84b1f28802925d8dddebd7e213f6/href

Now we can try out our custom loss function in the LightGBM model, for different values of β. With β = 2.5, we get the same number of false negatives and false positives, but the overall performance has decreased (F1-score= 0.86), so this is not a better solution. However, with β = 1.5, we are actually able to increase the performance from the previous model (but still having a certain imbalance between false negatives and false positives). We get the following results:

yielding the weighted F1-score below:

which is a slight improvement on our first LightGBM model.

LightGBM: Predicting Titanic survivors with Gradient Boosting was originally published in Backtick Technologies on Medium, where people are continuing the conversation by highlighting and responding to this story.

Formalizing Backtick Technologies AB

Oskar Handmark — Wed, 21 Nov 2018 08:59:55 GMT

Formalizing Backtick Technologies AB

We have some very exciting news to share!

Late september, the formalization of Backtick Technologies as a company completed. Backtick Technologies will focus on creating exciting, impactful software in areas related to data engineering, machine learning, artificial intelligence and full stack development.

https://backtick.se/news/formalizing-backtick-technologies-ab

Formalizing Backtick Technologies AB was originally published in Backtick Technologies on Medium, where people are continuing the conversation by highlighting and responding to this story.

Backtick Technologies - Medium

LightGBM: Predicting Titanic survivors with Gradient Boosting

Titanic disaster dataset

Baseline model

Gradient boosting

LightGBM model

Custom loss function

Formalizing Backtick Technologies AB