# Overview

In this notebook we explore correlation and the notion of correlation and the correlation coefficient.

Agenda:
1. Prerequisite Topics
2. Types Of Correlation
3. Pearson Correlation Coefficient

# 1. Prerequisite Topics

## 1.1 Linearity
Linearity is a property of a mathematical relationship.

https://en.wikipedia.org/wiki/Linearity

## 1.2 Deviation
Deviation is a measurement of the distance between a variable and the central measurement (typically the mean) of a distribution. 

<center><img src="deviation_one_dimension.png" alt="One Dimensional Deviation" style="width: 400px"/></center>

As we will see, deviation is a foundational measurement used in a wide range of descriptive statistics. For example the standard deviation is based on the concept of deviation. Deviation is loosly interpreted as a measurement for how normal or irregular a particular value is when compared to the distribution. There are a number of different measruements for deviation. 

The simplest formula is given as:

$$ d = x - \mu $$

A positive value indicates a particular value $x \in X$ is larger than $\mu$ and a negative indicating the opposite.

## 1.3 Variance
Variance is a descriptive statistic that attempts to describe how spread out a data sample is. It is expressed as an expected value and its core measruement is based on the square of a deviation from the mean. This allows it to plug into probability/likelihood frameworks. It also "penalizes" larger deviations due to the squaring; ie. the variance for a given deviation will be disproportionately larger for a large value than a small one. The formula is given as:


$$ Var(X)= \sigma^2 = \mathbb{E} \left[ (X-\mu )^{2} \right] $$


A another common representation of the formula can be derived as follows:

$$ = \mathbb{E} \left[( X - \mathbb{E}[X])^2  \right] $$

$$ = \mathbb{E} \left[ X^2 - 2X\mathbb{E}[X] + \mathbb{E}[X]^2  \right] $$

$$ = \mathbb{E}[X^2] - 2\mathbb{E}[X]\mathbb{E}[X] + \mathbb{E}[X]^2 $$

$$ = \mathbb{E}[X^2] - 2\mathbb{E}[X]^2 + \mathbb{E}[X]^2 $$

$$ = \mathbb{E}[X^2] - E[X]^2$$

If we expand the Expectation operator $\mathbb{E}$ for a discrete random variable we would have:

$$ \mathbb{E}[X] = \sum \left( p_X * x_i \right) $$

If we assume a uniform random variable, with $p=\frac{1}{n}$, we derive:

$$ \mathbb{E}[X] = \frac{1}{n}\sum x_i $$


## 1.4 Covariance
Covariance is an attempt to explain variance in two dimensions or for two variables. Given two variables X and Y, the covariance multiplies the deviation of X by the deviation of Y. In a two dimentional space this would resemble the following:

<center><img src="deviation_two_dimension.png" alt="Two Dimensional Deviation" style="width: 400px;"/></center>

Here, the horizontal and vertical line represent the means respective to the X and Y variables.

The geometric implications here are an important caveat to consider. A square is going to have the largest area of any rectangle with the same parimiter. As such, if the co-variables have the exact same deviations, the covariance value will form a square and thus be the largest possible value. If one variable is deviating while another is not, the area will be very small.

The formal definition is given as:

$$ Cov(X,Y) = \sigma_{X,Y} := \mathbb{E}[(X-\mu_X)(Y-\mu_Y)] $$

$$ = \mathbb{E} [ XY - X \mathbb{E}[Y] - \mathbb{E}[X][Y] + \mathbb{E}[X]\mathbb{E}[Y] ] $$

$$ = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] - \mathbb{E}[X]\mathbb{E}[Y] + \mathbb{E}[X]\mathbb{E}[Y] $$

$$ = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] $$

If we expand the Expectation operator $\mathbb{E}$ for a discrete random uniform random variable, with $p=\frac{1}{n}$, we would have:

$$ Cov(X,Y) = \frac{1}{n}\sum(x_i-\mu_X)(y_i - \mu_Y) $$

# 2. Types of Correlation

There are several types of correlations
- Pearson Correlation
- Kendal Rank Correlation
- Spearman Correlation
- Point-Biserial Correlation

https://www.statisticssolutions.com/correlation-pearson-kendall-spearman/

# 3. Pearson Correlation
## 3.1. Overview
The pearson correlation coefficient is the mode widely used statistical measure. It assumes a linear relationship bweteen two variables and measures the strength and direction of that relationship.

## 3.2 Definition

The correlation coefficient is expressed through the following formula:

$$ Cor(X,Y) = \frac{Cov(X,Y)}{Std. \ Dev(X) \ Std. \ Dev(Y)}  $$

$$ \rho = \frac{\sigma_{X,Y}}{\sigma_X \sigma_Y} $$

It is common to see this expanded for a uniform discrete variable:

$$ =\frac{\frac{1}{n}\sum{(x_i - \mu_X)(y_i-\mu_Y)}}{\sqrt{\frac{1}{n}\sum{}(x_i-\mu_X)^2}\sqrt{\frac{1}{n}\sum{(y_i-\mu_Y)^2}}} $$

If we simlify the formula by calcelling out the $1/n$ (ie. removing the probability consideration) we would have something like

$$ =\frac{\sum{(x_i - \mu_X)\sum(y_i-\mu_Y)}}{\sqrt{\sum{(x_i-\mu_X)^2}}\sqrt{\sum{(y_i-\mu_Y)^2}}} $$

$$  =\frac{\sum{(x_i - \mu_X)(y_i-\mu_Y)}}{\sqrt{\sum{(x_i-\mu_X)^2(y_i-\mu_Y)^2}}} $$

$$ =\frac{\sum d_xd_Y}{\sqrt{\sum d_x^2}\sqrt{\sum d_y^2}} $$

From this simplified equation we see the formula is expressed as a fraction of deviations. We have the co-deviation divided by the total deviation

## 3.3. Intuition and Interpretation

I think its helpful to consider this fraction of deviation measurements.

In order to understand the correlation coefficient we need to understand that it is a fraction or a proportion of deviations. 

Thinking about this geometrically helps explain the concept. 
If we think about deviation in one dimension we would have the following:

<img src="deviation_one_dimension.png" alt="One Dimensional Deviation" style="width: 400px;"/>

If we think about the problem in two dimensions we would have the following:
    
<img src="deviation_two_dimension.png" alt="Two Dimensional Deviation" style="width: 400px;"/>

Referring back to the equation, when we look at the numerator

$$\sum d_x d_y$$

We can rearange the product of two deviations as a single term (ie. $d_{x_1}*d_{y_1} = d_1$). This can be interpreted as a rectangle representing the deviation caused by the two variables. We can think about the sum of the rectangles as follows:

<img src="deviation_two_dimension_geom.png" alt="Two Dimensional Deviation Sum simple" style="width: 400px;"/>

Now if we consider the denominator of the equation

$$\sqrt{\sum d_x^2}\sqrt{\sum d_y^2}$$

It can also be interpreted as an area. This area is intended to represent the maximum possible total deviations of both x and y.



To understand this point, consider the geometric implications. An important characteristic of the square is that it has the largest area of any rectangle. This means that the squaring of the deviations represent the largest possible area representing the two dimensional deviation. Taking the quare root of this square gives us the optimal length of an arbitrary side.

<img src="deviation_two_dimension_sum.png" alt="Two Dimensional Deviation Sum" style="width: 400px;"/>

Thus the denominator is again representing the maximum possible total deviation from both variables.

So the correclation coefficient is essentially a fraction of covariance to total possible variance!

## 3.4. Additional Notes

An interesting caveat in this calculation is the way that we determine the total deviation for a specific variable. We are using the squaring technique to find the absolute values rather than using the absolute value function. There are other correlation measurements which instead use the absolute value. We will look at those and te effects separately.