From the course: Machine Learning with Python: Logistic Regression
What is regression? - Python Tutorial
From the course: Machine Learning with Python: Logistic Regression
What is regression?
- [Instructor] Regression or regression analysis refers to a family of machine learning algorithms that are used to quantify the size and strength of the relationship between two or more numerical values. Regression is one of two major categories of supervised machine learning. The other is known as classification. Classification problems are supervised machine learning problems where the dependent variable is categorical or qualitative. For example, a machine learning model that predicts whether a tumor is benign or malignant is a classification model. The values benign or malignant are categorical. In contrast to classification, regression problems are supervised machine learning problems where the dependent variable is continuous or quantitative. For example, a machine learning model that predicts the annual sales numbers for a particular product based on advertising spend is a regression model. Annual sales is a continuous value. It has an infinite number of possible values between the lower and upper bounds. To further illustrate how regression analysis is used, let's assume that we work for a bike rental company and are trying to build a machine learning model that estimates how many bikes to deliver to a location to meet anticipated customer demand. To build such a model, we need some historical data or what is known as ground truth data. Suppose that over the last month, our company kept a record of the average daily temperature and the number of bikes rented. Shown here is a 10 day sample of that data. To build a regression model using this data, we could assume that the average daily temperature has a direct impact on the number of bikes rented. As a result, we will designate the column that holds the average daily temperature as the independent variable. Because our objective is to predict the number of rentals based on temperature, the rentals column would serve as the dependent variable. Using the independent and dependent variables as input, a regression algorithm would attempt to estimate a function, F of X beta, that models the relationship between the values of the independent variable and the values of the dependent variable. The estimated function is what we refer to as a regression model. With a regression model, we can do one of two things. The first is prediction. If we know the estimated daily temperature for any given day, we can simply pass it to our regression model and it'll predict the number of bikes it expects customers to rent on that day. Regression models are also useful for inference. With a regression model, we can approximate the impact that a unit change in a predictive variable would have on the response. For example, we can use our bike rental model to answer a question such as how many more or how many fewer bikes would customers rent if the average daily temperature rose by one degree?