## What is regression? 

Regression or regression analysis refers to a family of machine learning algorithms that are used to quantify the size and strength of the relationship between two or more numerical values. 

Regression is one of two major categories of supervised machine learning. The other is known as classification. 

Classification problems are supervised machine learning problems where the dependent variable is categorical or qualitative. For example, a machine learning model that predicts whether a tumor is benign or malignant is a classification model. The values benign or malignant are categorical. 

In contrast to classification, regression problems are supervised machine learning problems where the dependent variable is continuous or quantitative. For example, a machine learning model that predicts the annual sales numbers for a particular product based on advertising spend is a regression model. Annual sales is a continuous value. It has an infinite number of possible values between the lower and upper bounds. 

To further illustrate how regression analysis is used, let's assume that we work for a bike rental company and are trying to build a machine learning model that estimates how many bikes to deliver to a location to meet anticipated customer demand. To build such a model, we need some historical data or what is known as ground truth data. Suppose that over the last month, our company kept a record of the average daily temperature and the number of bikes rented. Shown here is a 10 day sample of that data. To build a regression model using this data, we could assume that the average daily temperature has a direct impact on the number of bikes rented. As a result, we will designate the column that holds the average daily temperature as the independent variable. Because our objective is to predict the number of rentals based on temperature, the rentals column would serve as the dependent variable. 

Using the independent and dependent variables as input, a regression algorithm would attempt to estimate a function, F of X beta, that models the relationship between the values of the independent variable and the values of the dependent variable. 

The estimated function is what we refer to as a regression model. With a regression model, we can do one of two things: 
+ The first is prediction. If we know the estimated daily temperature for any given day, we can simply pass it to our regression model and it'll predict the number of bikes it expects customers to rent on that day. 
+ Regression models are also useful for inference. With a regression model, we can approximate the impact that a unit change in a predictive variable would have on the response. For example, we can use our bike rental model to answer a question such as how many more or how many fewer bikes would customers rent if the average daily temperature rose by one degree?

## The anatomy of a regression model

By quantifying the size and strength of the relationship between two or more numerical values, regression models allow us to predict or forecast an output value based on a set of input values. 

The anatomy of a regression model is made up of three components

+ The first is a continuous value. We intend to predict. Why? This is the dependent variable and it's also known as the response variable. 
+ The second component is a collection of one or more numeric variables X, that we intend to use to predict the response variable. These are known as the predictors or independent variables. 
+ The third component is a set of coefficient beta which describe the relationships between the predictors and the response variable. 

To help motivate our understanding of the anatomy of a regression model, let's assume that we work for a bike rental company and would like to build a regression model that estimates how many bikes to deliver to a location to meet anticipated customer demand. 

Suppose that what is shown here is a sample of the historical data that we intend to build a model with, the column with the values that we're trying to predict rentals, represents the first component, Y, of a regression model. This is the response. It is known. 

The second component, X, of a regression model is represented by the temperature column. This is a predictor. This is also known. 

The third component of a regression model, beta, is unknown. It is estimated based on the values of both the predictor and the response. 

How does a regression algorithm estimate these values? 

An illustration would help. If we create a scatter plot of the historical data with the predictor on the X-axis and the response on the Y-axis, we get a plot that looks like this. 

Regression models are parametric models. Parametric models require that we make assumptions about the nature of the data in order to choose the right function to model the data. If we make the assumption that the relationship between the predictor and the response is linear then we use a linear regression algorithm to find a straight line, that best fits the data. 

Mathematically, the equation for the line of best fit is usually written in this format. The goal of a linear regression algorithm is to estimate the optimal values for beta given a set of X and Y values. The position and slope of the line vary depending on the values for beta. Beta zero is a intercept and it impacts the position of the line on the Y-axis. Increasing the value shifts the line upwards and reducing the value shifts the line downwards. Beta one is the slope, and it impacts the line's angle of tilt. 

A positive value implies an upward slope, while a negative value implies a downward slope. As you can imagine, there are infinite possible values for beta zero and beta one with each combination resulting in a different line. 

Linear regression algorithms often use an approach known as ordinary lease squares or OLS to estimate the optimal values for beta zero and beta one. The mechanics of OLS are beyond the scope of this course. However, the general idea is that the optimal beta values are ones that result in a line with the least sum of squared distances or residuals between the observed data and corresponding points on a linear regression line.



## Common types of regression

Depending on the nature of our data and the type of value we want to predict, we can use one of several forms of regression. 

Let's assume once again that we work for a bike rental company and are trying to build a regression model that estimates how many bikes will be rented based on weather conditions. 

    Simple Linear Regresion
***If our historical data consists of a single predictor variable X, and we assume that the relationship between the predictor variable and the response Y is linear, then we use simple linear regression to model this relationship.

The line equation for this approach is beta0 + beta1 X as shown here. 

Simple linear regression is useful when we only have one predictor variable in our Ground truth data. However, we often have to consider several predictors in order to reliably estimate the values of a response variable. To accomplish this, we use a different type of linear regression to model the relationship. 

    Multiple Linear Regression
***If we have more than one predictor variable X and assume that the relationship between the predictor variables and the response Y is linear, then we use the approach known as multiple linear regression to model the relationship. 

The multiple linear regression line equation is beta0 + beta1 X1 + beta2 X2, up to beta-p Xp, where P is a number of predictors we intend to consider. 

With linear regression, their linear relationship between the predictor and the response, implies that a constant change in the predictor variable leads to a constant change in the response variable. 

Let's consider the regression line shown here, that models the relationship between the average daily temperature and the number of bikes that customers rent. According to the regression line, a constant change in temperature will lead to a constant change in the number of bikes rented. In other words, if delta1 is equal to delta2, then delta3 will also be equal to delta4. Linear regression also assumes that the values of the response variable are normally distributed, and that they can vary indefinitely in either direction, with no fixed zero value. 

However, we know that these two assumptions are not quite true in this scenario. For example, the value of the response variable cannot be negative. Customers cannot rent a negative number of bikes. A more appropriate expectation of our data is that that a constant change in the predictor variable would result in a geometric or exponential change in the response variable. For example, a 10-degree change in temperature would more likely result in the doubling or tripling of the number of bikes rented as shown here. In other words, delta3 is not equal to delta4, even though delta1 is equal to delta2. 

To model a response variable, that is never zero, and that varies exponentially in response to a constant change in a predictor variable, we need a different type of regression technique. 

    Poisson Regression
***If we assume that the relationship between the predictor variable X, and the response Y is exponential or log linear, then we use the approach known as Poisson Regression to model the relationship. 

The Poisson Regression line equation is shown here. Notice that it is very similar to the linear regression equation with the difference being that instead of estimating Y, we estimate the log of Y. Poisson Regression is especially useful when our predictor is positive and the response variable is a count that ranges in value from zero to infinity. For example, we can use Poisson Regression to predict the number of people who buy tickets to a concert based on predictors such as ticket price, time of year, and the number of people who follow the band on social media. 

Now let's consider a slight variation to the bike rental scenario. What if instead of trying to predict how many bikes customers would rent, we decide to predict whether a given customer will or will not rent an E-bike based on their age. A scatter plot of the historical data for this type of problem could look like this. The response values are binary and not continuous, like in the previous scenario. A linear regression model would fit this data poorly, so would a Poisson Regression model. To model this type of relationship, we need a technique that allows us to create an S-shaped curve like this one that is bounded on both ends. This type of curve is known as a sigmoid curve. 

    Logistic Regression
***If we assume that the relationship between the predictor variable X and the response Y is binary or dichotomous, then we use the approach known as Logistic Regression to model the relationship.