# Overview

In this notebook we examine variance. It assumes the user has read the [expectation notebook](./Expectation.ipynb).

# 1. General Definition

The variance gives a descriptive statistic that explains our expectations for the "spread" of our data. A larger variance indicates a larger range of possible values that our variabl might take on; conversely a smaller variance indicates a smaller range.

The general formula expresses variance as the expected value $\mathbb{E}$ of the squared deviations:

$$ Var(X)= \sigma^2 = \mathbb{E} \left[ (X-\mu )^{2} \right] $$

We will see that there is a deep mathematical implication/connection with the squaring of deviations. In short, the sum of squares emphasizes or overrepresents larger deviations and under-represents smaller deviations. More on this topic later.

A another common representation of the formula can be derived by expanding the compact formula:

$$ = \mathbb{E} \left[ X^2 - 2X\mathbb{E}[X] + \mathbb{E}[X]^2  \right] $$

$$ = \mathbb{E}[X^2] - 2\mathbb{E}[X]\mathbb{E}[X] + \mathbb{E}[X]^2 $$

$$ = \mathbb{E}[X^2] - 2\mathbb{E}[X]^2 + \mathbb{E}[X]^2 $$

$$ = \mathbb{E}[X^2] - E[X]^2$$

As noted in the [expectations notebook](./expectations.ipynb), a probability distribution can be attached to the expectation operator $\mathbb{E}$. We will cover distribution specific derevations separately.

# 2. Univariate Case
## 2.2. Matrix Notation
For the multivariate case, the general notation does hold. However, the abstract notation does hide some of the details of the underlying mathematical structures and as a result can erroneously lead to spurious aglebraic manipulations. In otherwords: martix algebra and multidimensional spaces are different than the univariate spaces we learned in grade school; lets use a more complicated notation to prevent mistakes or bad assumptions later.

$$ X := \begin{bmatrix}
x_1, &
x_2, &
\cdots, &
x_n
\end{bmatrix}, \ \ \ 
$$

$$ Var(X) := \mathbb{E} \left[ (X-\mu)^{2} \right] $$

$$ = \mathbb{E} \left[ (X-\mu)(X-\mu)^T \right] $$


$$  = 
\begin{bmatrix}
(x_1 - \mu_1) & 
(x_2 - \mu_1) & 
\cdots &
(x_n - \mu_1)
\end{bmatrix}
\begin{bmatrix}
x_1 - \mu_1 \\
x_2 - \mu_1 \\
\vdots \\
x_n- \mu_1
\end{bmatrix}
$$

$$  = 
\begin{bmatrix}
(x_1 - \mu_1)^2 & + & 
(x_2 - \mu_1)^2 & + &
\cdots & + &
(x_n - \mu_1)^2
\end{bmatrix}$$

# 3. Multivariate Case

For the multivariate case, the general notation does hold. However, the abstract notation does hide some of the details of the underlying mathematical structures and as a result can erroneously lead to spurious aglebraic manipulations. In otherwords: martix algebra and multidimensional spaces are different than the univariate spaces we learned in grade school; lets use a more complicated notation to prevent mistakes or bad assumptions later.

We define our variable matrix as a column vector

$$ X := \begin{bmatrix}
X_1, &
X_2, &
\cdots, &
X_n
\end{bmatrix}, \ \ \ 
$$

$$X_1 := \begin{bmatrix}
x_{1,1} \\
x_{1,2} \\
\vdots \\
x_{1,n}
\end{bmatrix}
$$

We define the mean matrix as a column vector as well

$$ \mu := \begin{bmatrix}
\mu_1, &
\mu_2, &
\cdots, &
\mu_n
\end{bmatrix}
$$

$$ \mu_n := \mathbb{E}[X_n] $$

We start with the univariate definition and swap out $\sigma^2$ for $\Sigma$. As we will see, the multivariate variance matrix $\Sigma$ contains more than just variances.

$$ Var(X)= \Sigma = \mathbb{E} \left[ (X-\mu)^{2} \right] $$

We expand our squaring operator to be explicit about our multiplcation operations using matrices.

$$ = \mathbb{E} \left[ (X-\mu)^T(X-\mu) \right] $$

We can go a step further by expanding the expression within the expectation to account for the individual elements within the matrix:


$$\Sigma = \mathbb{E}\begin{bmatrix}
\begin{bmatrix}
X_1 - \mu_1 \\
X_2 - \mu_2 \\
\vdots \\
X_n - \mu_n
\end{bmatrix}
\begin{bmatrix}
X_1 - \mu_1, &
X_2 - \mu_2, &
\cdots, &
X_n - \mu_3
\end{bmatrix}
\end{bmatrix}$$



$$ = \mathbb{E}
\begin{bmatrix}
(X_1 - \mu_1)^2               & \cdots & (X_1 - \mu_1)(X_n - \mu_n) \\
\vdots                        & \ddots & \vdots            \\
(X_n - \mu_n)(X_1 - \mu_1)    & \cdots & (X_1 - \mu_n)^2 \\
\end{bmatrix}$$

Note: We have ensured we have the right dimensions for addition/subtraction. The dimensions will be such that $(1 \times n) - (1 \times n ) = (1 \times n) $.

$$ \mathbb{E}\left[(X_1 - \mu_1)^2\right] = 
\mathbb{E}\begin{bmatrix}
x_{1,1} - \mu_1 &
x_{1,2} - \mu_1 &
\cdots &
x_{1,n} - \mu_1
\end{bmatrix}
\begin{bmatrix}
x_{1,1} - \mu_1 \\
x_{1,2} - \mu_1 \\
\vdots \\
x_{1,n} - \mu_1
\end{bmatrix}
$$

$$ = \mathbb{E}\begin{bmatrix}
(x_{1,1} - \mu_1)^2 & + &
(x_{1,2} - \mu_1)^2 & + &
\cdots & + &
(x_{1,n} - \mu_1)^2
\end{bmatrix}$$

When we apply a distribution through the expectation operator we can transform this equation further

$$ = \sigma_1^2 $$

$$ \Sigma = \begin{bmatrix}
\sigma_{1,1}^2   & \cdots & \sigma_{n,1}^2 \\
\vdots           & \ddots & \vdots         \\
\sigma_{1,n}^2   & \cdots & \sigma_{n,n}^2 \\
\end{bmatrix}$$

## Python example
We can see an python example of calculating covariance using matrix algebra:

In [190]:
import numpy

# ===========================
# Calculate by hand

# Define our data
X1 = numpy.array([1,2,6])
X2 = numpy.array([2,3,5])
X = numpy.array([X1, X2]).T

# Caclulate the mean
mu_1 = X1.mean()
mu_2 = X2.mean()
mu = numpy.array([mu_1, mu_2])

n = X.shape[1]
Sigma = 1/n * (X - mu.T).T @ (X - mu.T)

# ===========================
# Compare with library function

Sigma_check = numpy.cov(X.T)
numpy.all(Sigma == Sigma_check)

True

In [183]:
import numpy

# ===========================
# Calculate by hand

# Define our data
X1 = numpy.array([1,2,6])
X2 = numpy.array([2,3,5])
X3 = numpy.array([4,6,9])
X = numpy.array([X1, X2, X3]).T

# Caclulate the mean
mu_1 = X1.mean()
mu_2 = X2.mean()
mu_3 = X3.mean()
mu = numpy.array([[mu_1, mu_2, mu_3]]).T

n = X.shape[1] - 1
Sigma = 1/n * (X - mu.T).T @ (X - mu.T)

# ===========================
# Compare with library function

Sigma_check = numpy.cov(X.T)
numpy.all(Sigma == Sigma_check)

True

# 4. Variance Exampls

## 4.1. Uniform Expectation
Depending on the probability attached to the expectaion operator $\mathbb{E}$ we can have a number of different formals being derived. In the case of sample statistics or population statistics it is common to apply an equal weighting to the observed values $x_i \in X$. As such we can use the probability mass formula of the uniform random variable:

$$ \mathbb{E}_f(X) := \sum x_i f(x_i)$$

$$ X \sim \mathcal{U}(0,  |X|) \rightarrow f_\mathcal{U} := \frac{1}{|X|} $$

In this case we see the expectation for discrete variables expanded as:

$$ \mathbb{E} \left[ (X-\mu )^{2} \right] = \sum \frac{1}{n} (X-\mu )^{2}  $$

Likewise continuous variables are expanded as:

$$ = \int \frac{1}{n} (X-\mu )^{2} dx $$