# Covariance
Covariance is a statistical measure that indicates the extent to which two variables change together. 
If the variables tend to increase or decrease together, the covariance is positive. 
If one increases while the other decreases, the covariance is negative. 
A covariance close to zero suggests no linear relationship between the variables.

Mathematically, the covariance between two random variables X and Y is defined as:
$$ 
\begin{align}
\mathrm{Cov}(X, Y) = \sigma_{XY} &= \mathrm{E}[(X - \mathrm{E}[X])(Y - \mathrm{E}[Y])] \\
&= \mathrm{E}[XY - Y\mathrm{E}[X] - X\mathrm{E}[Y] + \mathrm{E}[Y]\mathrm{E}[X]] \\
&= \mathrm{E}[XY] - \mathrm{E}[Y\mathrm{E}[X]] - \mathrm{E}[X\mathrm{E}[Y]] + \mathrm{E}[\mathrm{E}[Y]\mathrm{E}[X]]\\
&= \mathrm{E}[XY] - \mathrm{E}[Y]\mathrm{E}[X] - \mathrm{E}[X]\mathrm{E}[Y] + \mathrm{E}[Y]\mathrm{E}[X] \\
&= \mathrm{E}[XY] - \mathrm{E}[Y]\mathrm{E}[X]
\end{align}
$$
For a sample of size n:
$$ \mathrm{Cov}(X, Y) = \sigma_{XY} = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y}) $$
It can also be equivalently expressed, without directly referring to the means, as:
$$  
\mathrm{Cov}(X, Y) = \frac{1}{n^2}\sum_{i=1}^n\sum_{j=1}^n\frac{1}{2}(x_i-x_j)(y_i-y_j) = 
\frac{1}{n^2}\sum_{i}^n\sum_{j>i}^n(x_i-x_j)(y_i-y_j)
$$
More generally, if there are $n$ possible realizations of $(X,Y)$, namely $(x_i,y_i)$ but with possibly unequal probabilities 
$p_i \quad i=1,\cdots,n$, then the covariance is
$$ \mathrm{Cov}(X,Y) = \sum_{i=1}^n p_i (x_i - \mathrm{E}[X])(y_i - \mathrm{E}[Y]) $$

In the case where two discrete random variables $X$ and $Y$ have a joint probability distribution, represented by elements 
$p_{i,j}$ corresponding to the joint probabilities of $P(X=x_{i},Y=y_{j})$, 
the covariance is calculated using a double summation over the indices of the matrix:
$$ \mathrm{Cov}(X,Y) = \sum_{i=1}^n \sum_{j=1}^n p_{ij} (x_{i} - \mathrm{E}[X])(y_{j} - \mathrm{E}[Y]) $$

## Interpretation of covariance
- Covariance > 0: X and Y tend to increase together.
- Covariance < 0: When X increases, Y tends to decrease (and vice versa).
- Covariance ≈ 0: No linear relationship.

Covariance is used in statistics and data analysis to measure the relationship between two variables, 
and is a key component in constructing the covariance matrix for multivariate data.
The Pearson correlation coefficient $(\rho)$ normalizes the covariance by dividing by the geometric mean of the total variances for the two random variables.

A distinction must be made between (1) the covariance of two random variables, which is a population parameter that can be seen as a property of the joint probability distribution, and (2) the sample covariance, which in addition to serving as a descriptor of the sample, also serves as an estimated value of the population parameter.

The variance is a special case of the covariance in which the two variables are identical
$$\mathrm{Cov}(X,X)=\mathrm{Var} (X)\equiv \sigma ^{2}(X)\equiv \sigma _{X}^{2}.$$

## Uncorrelatedness and independence
Random variables whose covariance is zero are uncorrelated. If $X$ and $Y$ are independent, then their covariance is zero. This follows because under independence:

$$\mathrm{E}[XY] = \mathrm{E}[X] \cdot \mathrm{E}[Y] $$ 

then: $$\mathrm{Cov}(X,Y) = \mathrm{E}[XY] - \mathrm{E}[X]\mathrm{E}[Y] \therefore \mathrm{Cov}(X,Y) = 0$$

The converse, however, is not generally true. For example, let $X$ be uniformly distributed in $[−1,1]$ and let 
$Y=X^2$ Clearly, $X$ and $Y$ are not independent, but:
$$ 
\begin{align}
\mathrm{Cov}(X, Y) &= \mathrm{Cov}(X, X^2) \\
&= \mathrm{E}[XY] - \mathrm{E}[X]\mathrm{E}[Y] \\
&= \mathrm{E}[XX^2] - \mathrm{E}[X]\mathrm{E}[X^2]\\
&= \mathrm{E}[X^3] - \mathrm{E}[X^3] = 0
\end{align}
$$

In this case, the relationship between $Y$ and $X$ is non-linear, while correlation and covariance are measures of linear dependence between two random variables. This example shows that if two random variables are uncorrelated, that does not in general imply that they are independent.

$X$ and $Y$ whose covariance is positive are called positively correlated, which implies if 
$X>E[X]$ then likely $Y>E[Y]$. Conversely, $X$ and $Y$ with negative covariance are negatively correlated, and if 
$X>E[X]$ then likely $Y<E[Y]$.



## TODO
Mathematical properties of covariance and linear combinations
Covariance of linear combinations section of -> https://en.wikipedia.org/wiki/Covariance

## Examples

In [None]:
from sklearn.datasets import load_diabetes

In [2]:
df = load_diabetes(as_frame=True, scaled=False)
df.frame

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,59.0,2.0,32.1,101.00,157.0,93.2,38.0,4.00,4.8598,87.0,151.0
1,48.0,1.0,21.6,87.00,183.0,103.2,70.0,3.00,3.8918,69.0,75.0
2,72.0,2.0,30.5,93.00,156.0,93.6,41.0,4.00,4.6728,85.0,141.0
3,24.0,1.0,25.3,84.00,198.0,131.4,40.0,5.00,4.8903,89.0,206.0
4,50.0,1.0,23.0,101.00,192.0,125.4,52.0,4.00,4.2905,80.0,135.0
...,...,...,...,...,...,...,...,...,...,...,...
437,60.0,2.0,28.2,112.00,185.0,113.8,42.0,4.00,4.9836,93.0,178.0
438,47.0,2.0,24.9,75.00,225.0,166.0,42.0,5.00,4.4427,102.0,104.0
439,60.0,2.0,24.9,99.67,162.0,106.6,43.0,3.77,4.1271,95.0,132.0
440,36.0,1.0,30.0,95.00,201.0,125.2,42.0,4.79,5.1299,85.0,220.0


In [3]:
def compute_covariance(X, Y):
    """
    Compute covariance between two vectors X and Y.
    X and Y should be 1D numpy arrays.
    """
    xm = X.sum()/X.shape[0]
    ym = Y.sum()/Y.shape[0]
    cov = sum((X - xm) * (Y - ym))/X.shape[0]
    return cov

def compute_covariance_method2(X ,Y):
    """
    Compute covariance between two vectors X and Y.
    X and Y should be 1D numpy arrays.
    This method does not compute means to compute covariance.
    """
    x_diff = X.reshape(1,-1) - X.reshape(-1, 1)
    y_diff = Y.reshape(1,-1) - Y.reshape(-1, 1)
    prods = x_diff * y_diff
    cov = 0.5 * prods.sum() / X.shape[0]**2
    return cov

In [4]:
cov_age_bmi = compute_covariance(df.data['age'], df.data['bmi'])
cov_age_bp = compute_covariance(df.data['age'], df.data['bp'])
cov_age_age = compute_covariance(df.data['age'], df.data['age'])

cov_age_bmi_m2 = compute_covariance_method2(df.data['age'].values, df.data['bmi'].values)
cov_age_bp_m2 = compute_covariance_method2(df.data['age'].values, df.data['bp'].values)
cov_age_age_m2 = compute_covariance_method2(df.data['age'].values, df.data['age'].values)

var_age = df.data['age'].var()
print(f"Covariance between age and bmi: m1: {cov_age_bmi:.2f}; m2: {cov_age_bmi_m2:.2f}")
print(f"Covariance between age and bp: m1: {cov_age_bp:.2f}; m2: {cov_age_bp_m2:.2f}")
print(f"Covariance between age and age: m1: {cov_age_age:.2f}; m2: {cov_age_age_m2:.2f}")
print(f"Variance of age: {var_age:.2f}")

Covariance between age and bmi: m1: 10.70; m2: 10.70
Covariance between age and bp: m1: 60.68; m2: 60.68
Covariance between age and age: m1: 171.46; m2: 171.46
Variance of age: 171.85


Notice how the covariance of age with itself is equal to the variance of the age. The covariance of a variable with itself equals
its variance. In the resulst above showcases one of the problems with the covariance, what does it mean a covariance of 10.7 for age and bmi? How do we interpret the result? Suppose we try to assign units to the result: Body Mass Index (BMI) is in $\frac{kg}{m^2}$ whereas age is in years (y), the product of the units is $\frac{y \cdot kg}{m^2}$ which is hard to interpret.

Let's check my function implementation of covariance by computing the comvariance with library functions or methods:

In [5]:
# Covariance matrix of age, bmi and bp
df.frame[["age","bmi","bp"]].cov()

Unnamed: 0,age,bmi,bp
age,171.84661,10.7196,60.817945
bmi,10.7196,19.519798,24.162884
bp,60.817945,24.162884,191.304401


The result is a covariance matrix, which is a square matrix giving the covariance between each pair of features 
of the data. The diagonal elements are the variances of each element. The covariance matrix is symmetric, 
and the covariance between two variables is equal to the covariance between them in either order.

A widely common method to interpret covariances is to standardize them by dividing by the standard deviations of the variables. The results ensures the output value will be between -1 and 1 and has not units, that is, it is dimensionless. The resulting calculation is known as Pearson correlation coefficient.

In [6]:
r_age_bmi = cov_age_bmi / (df.data['age'].std() * df.data['bmi'].std())
r_age_bp = cov_age_bp / (df.data['age'].std() * df.data['bp'].std())
r_age_age = cov_age_age / (df.data['age'].std() * df.data['age'].std())

print(f"Correlation between age and bmi: {r_age_bmi:.3f}")
print(f"Correlation between age and bp: {r_age_bp:.3f}")
print(f"Correlation between age and age: {r_age_age:.3f}")

Correlation between age and bmi: 0.185
Correlation between age and bp: 0.335
Correlation between age and age: 0.998


Similarly to how there is a covariance matrix, there is a correlation matrix with similar properties.
The correlation matrix is a square matrix that contains the correlation coefficients between all pairs of variables in the dataset.
The diagonal elements of the correlation matrix are always equal to 1, as they represent the correlation of each variable with itself.

Let's check the correlations above equal the ones computed through the correlation matrix:

In [7]:
# correlation matrix of age, bmi and bp
df.frame[["age","bmi","bp"]].corr()

Unnamed: 0,age,bmi,bp
age,1.0,0.185085,0.335428
bmi,0.185085,1.0,0.395411
bp,0.335428,0.395411,1.0


The Pearson correlation coefficient is a measure of the linear correlation between two variables. 
It is defined as the covariance of the two variables divided by the product of their standard deviations. 
The value of the Pearson correlation coefficient ranges from -1 to 1, where:
* $-1$ indicates a perfect negative linear correlation
* $0$ indicates no linear correlation
* $1$ indicates a perfect positive linear correlation

The higher the absolute value of the correlation, the higher the predictive power of one of the variables to estimate the other.


## TODO
- [mathematical properties of covariance](https://en.wikipedia.org/wiki/Covariance)
- [Covariance matrix](https://en.wikipedia.org/wiki/Covariance_matrix)