# Regression Analysis


> "_Nature has established patterns originating in the return of events, but only for the most part. New illnesses flood the human race, so that no matter how many experiments you have done on corpses, you have not thereby imposed a limit on the nature of events so that in the future they could not vary._"  - **Gottfried Wilhelm Leibniz**


This notebook is aimed at showing the different types of regression and how they can be used to solve various problems. There are a couple caveats associated with regression and some common biases, we will only explore a handful of these in our discussion.

Before we begin with regresssion, we will take a dive into estimation approaches:
 
 - **Ordinary Least Squares (OLS)**
 - **Maximum Likelihood Expectation (MLE)**
 - **Bayesian (Univariate & Multivariate)**
 
and we'll make efforts to describe them in greater detail.

We will explore several types of regression namely:

 1. **Linear Regression**
 2. **Ridge Regression**
 3. **Lasso Regression**
 4. **Bayesian Linear Regression**
 5. **Logistic Regression**
 
**NB:** This series is a summary of **Part II: Early Computer-Age Methods** found in **CASI: Computer Age Statistical Inference**.


***


### [Ordinary Least Squares (OLS)](https://en.wikipedia.org/wiki/Ordinary_least_squares)

Ordinary Least Squares (OLS) is a statistical method for approximating the ___unknown parameters___ in a linear regression model by selecting ___parameters___ of linear function from a set of ___explanatory variables___ by the principle of ___least squares___:

> *... minimizing the sum of the squares in the differences between the observed ___dependent variable___ (values of the variable being predicted) in the given dataset and those predicted by the linear function ...*

OLS is consistent if ___regressors___ are exogenous (i.e., independent of the error term) in the linear model, errors are ___homoscedastic___ (have the same finite variance a.k.a., homogeneity of the variance) and are not correlated. This provides us with the ___minimum-variance mean unbiased___ estimation when the errors have finite variances.

If we add the assumption that the errors are normally distributed (i.e., follow a [Gaussian](https://en.wikipedia.org/wiki/Normal_distribution) distribution), OLS is the ___maximum likelihood estimator___.

The linear formulation:

Suppose our data has $n$ observations $\{y_i, x_i\}^n_{i=1}$, where each observation $i$ includes a scalar response $y_i$ and a column vector $x_j$ of values of $p$ predictors (regressors) $x_{ij}$ for $j = 1, ... , p$. In a linear regression model, the response variable, $y_i$, is a linear function of the regressors:

> $y_i = \beta_1x_{i1} + \beta_2x_{i2} + ... + \beta_px_{ip} + \epsilon_i,$