Machine Learning Algorithms

Boost Your Career with In-demand Skills - Start Now!

In this article, we will learn about Machine Learning Algorithms. Let’s start!!!

Types of machine learning

1. Supervised learning algorithms

In supervised machine learning, the machine is under supervision. Here the algorithm is given a set of variables (input), also known as attributes, and the output is predicted, known as the output variable. It uses labelled input and output data.

The machine learning algorithm is trained on a labelled dataset in supervised learning. This means that for each example in the dataset, the algorithm knows what the correct output is. And the algorithm uses this example to generalise to new examples it has never seen before.

Using the labelled inputs and outputs, the model can measure the accuracy and learn over time.

2. Unsupervised learning

Unsupervised learning is where the machine learning algorithm is not given any labels at all. Instead, these algorithms discover hidden patterns in data without any human intervention.

Unsupervised machine learning models are used for three main tasks. They are clustering, association and dimensionality reduction.

3. Reinforcement learning

This type of learning is mainly based on a reward system for the most recent state action combination.

We have an input frame, and we run it through a neural network model that produces an output action which is either up or down, but the only difference here is we don’t actually know the target label.

4. Semi-supervised learning

Pick up the large unlabeled dataset.
Label a small portion of the dataset.
Put the unlabelled dataset into clusters using an unsupervised machine learning algorithm.
Build your model to use the labelled data to label and classify the remaining unlabelled data.

Commonly used machine learning algorithms

Let us learn about some of the most commonly used machine learning algorithms.

1. Linear regression

Consider x variables and y variables. The independent variable is on the x-axis, and the dependent variable, y, is on the y-axis. We try to form a relation between these two variables and draw a straight line.

As the independent variable changes on this line, the dependent variable either goes up or down accordingly.

Suppose the independent variable increases with an increase in the dependent variable. In that case, there is said to be a positive relationship. On the other hand, if the dependent variable decreases with an increase, the variables have a negative relationship.

2. Logistic regression

Logistic regression is a special case of regression analysis. It is calculated when the dependent variable is nominal or ordinally scaled.

Dichotomous variables (0 or 1) can be predicted using logistic regression.

The probability of occurrence of a characteristic (=1 character is present) is estimated.

For example, a common goal in medicine is determining which variables impact the disease.

In this case, 0 could stand for “not disease” and 1 for “disease”, and the influence of age, gender and smoking status on this particular disease is estimated.

The logistic model is based on the logistic function. The important thing about the logistic function is that only values between 0 and 1 are entertained.

3. Decision trees

Decision trees are a type of supervised machine-learning algorithm we use for classification problems. It can operate on both continuous and categorical variables. The population in the decision trees is divided into two or more homogeneous sets.

In the above picture, we decide whether the child should play based on multiple attributes. First, we have the outlook attributes: sunny, overcast and rainy.

4. SVM- support vector machine

SVM is a classification method. Each object you want to classify is represented as a point in an n-dimensional space. The coordinates of this point are usually called features.

SVMs perform the classification test by drawing a hyperplane line in a 2D plane or a 3D plane so that all points of one category are on one of the sides of the hyperplane. All points of the other category are on one of the sides of the hyperplane, and all points of the other category are on the other side, while there could be multiple such hyperplanes.

The name support vector classifier comes from the fact that the observation on edge and within the soft margin are called support vectors.

5. Naive Bayes

Naive Bayes is another classification technique based on the Bayes theorem. It assumes independence among features. We can make a simplifying assumption that the elements of the feature vector are conditionally independent of each other, given the classification.

This is a great simplification over evaluating the full probability, so it might be surprising that the naive Bayes classifier has shown comparable results to other classification methods in certain domains.

The naive Bayes classifier produces exact MAP classification where the simplification is true so that the features are conditionally independent of each other.

6. KNN- K- nearest neighbours

The idea behind K- nearest Neighbours (KNN) is very simple. For each record to be classified or predicted:

Find K records that have similar features.
For classification, find out the majority of issues among similar records and assign that class to the new record.
For prediction, find the average among those similar records, and predict that average for the new record.

7. K-means

Clustering is a technique to divide data into different groups where the records in each group are similar. The goal of clustering is to identify meaningful groups of data. The groups can be used directly, analysed in more depth, or passed as a feature or an outcome of a predictive regression and classification model.

K- means was the first clustering method to be developed. It is still widely used owing to its popularity, the relative simplicity of the algorithm, and its ability to scale to large datasets.

8. Random forest

A random forest refers to a collection of multiple decision trees and is much less sensitive to the training data.

We use multiple trees, and hence it has the name forest.

Process of creating a random forest:

The first step is to build new datasets from our original data. Then, we randomly select rows from the original data to build the new dataset.

Here we perform random sampling with replacement. After selecting a row, we are putting it back into the data.

The process that we just followed to create new data is called Bootstrapping.

First, we train a decision tree on each of the datasets separately.

We randomly select a subset of the features for each tree and use only them for training.

Make predictions

We pass this new data point through each tree and note down the prediction. Finally, we combine all the predictions, and the majority voting is taken.

This process of combining multiple results is called bagging.

9. Dimensionality Reduction Algorithms

PCA is a technique to find how numeric variables covary. Covary means when they vary together. Some variations in one variable are caused by variations in another—for example, restaurant checks and tips.

It helps you reduce the number of dimensions into other lower number dimensions. Then we can apply machine learning algorithms.

As for the number of dimensions, it is considered a curse since it directly impacts the accuracy.

10. Gradient Boosting Algorithms

It is another boosting technique. The learning happens with the help of optimising the loss function. These use two types of base estimators first is the average type model, and second is the decision tree in full depth.

It is used for classification and regression.

XGBoost

XGboost is a framework that can run on multiple languages. It has good language support in R, Python, Julia, CPP, etc and is portable on different platforms.

It uses a method called boosting. Boosting combines weak learners sequentially so that each new tree corrects the errors of the previous one.

For any step m, gradient-boosted trees produce a model such that the ensemble at step m equals the ensemble at step m-1 plus the learning rate times the weak learner at step m.

We fit the gradient-boosted trees model using the XGboost library.

Light GBM

Light BGM is a fast, distributed, high-performance gradient framework based on the decision tree algorithm. It spits the tree leaf-wise.

Since this tree grows leaf-wise, it can lead to overfitting, which we minimise by defining the depth for spitting.

Histogram-based algorithm – each continuous feature is bucketed into discrete bins. Now, to compute the best spit, we need to iterate over the number of bins records instead of the number of points.

The histogram implementation can be easily optimised for sparse data, and most of the datasets we will deal with are sparse.

The trees are grown depth-first, keeping the presorted state. LightGBM chooses the leaf with maximum delta loss to grow and does not have to grow the whole level.

Catboost

Catboost is another library developed by Yandex. The main idea is to prevent overfitting and provide good default parameters.

Major improvements:

More sophisticated handling of categorical variables and good default parameters.
Fights the “gradient bias.”

Major ideas:

Use obvious trees.
For a single tree, use a random order of observations. Then while computing the gradient for observation, use only the predicting observations. Don’t use the current and the following ones.

Discrete Hopfield Network

A Hopfield network is an old conceptualisation of neural networks. The goal is to store the patterns in the neural network.

A content-addressable memory (CAM) system can take a part of a pattern and produce the most likely match from memory.

In 1982, John Hopfield published a famous paper. In it, Hopfield proposed a method for using neural networks such as CAM. the network learns the patterns and converges to the closest pattern when shown a partial pattern.

Backpropagation

In the case of neural networks, backpropagation is a common method to train neural networks by adjusting weights through error calculation in each iteration.

In this example, we use one input layer, one hidden layer and one output layer with two neurons each. We feed our neural networks with two numbers, and we are looking to predict two outputs.

First, the weights are initialised randomly. Then, we also set a bias in each layer. The first phase is forward propagation. We use the sigmoid function as our activation function.

We compute the values using the formula where the sum of the input is multiplied by the respective weights. You can also assign new weights values at the initialisation, change the input and output, and feed your neural network.

We calculate the error of each part of our neural and adjust the weights to get closer to the values of the desired output.

Hierarchical Clustering

It begins with one cluster, individual items in its own cluster and iteratively merges clusters until all items belong to one cluster.

A bottom-up approach is followed to merge the clusters together. Finally, dendrograms are pictorially used to represent HAC.

S- single nearest distance or single linkage

C- complete farthest distance or complete linkage

A- average distance or average linkage

Conclusion

Hence, today we discussed some of the most influential machine learning algorithms. Let me know which one you find the most interesting. Also, if you have any further suggestions, let us know in the comments below. We will get back to you. Thank you!