# 7. QLS - Variance

## Measures of Dispersion

This measures how spread out a set of data is. This is important because one of the main ways in which risk is measured is in how spread out returns have been historically. If returns have been tight around a value, it's good. If they have been all over the place, it's risky.

Data with low dispersion is clustered around the mean, while data with high dispersion indicates many very large/small values.

In [1]:
import numpy as np

Let's generate an array of random integers to work with.

In [2]:
np.random.seed(121)

# generate 20 random integers < 100
X = np.random.randint(100, size=20)

# sort them
X = np.sort(X)
print("X:", X)

mu = np.mean(X)
print("Mean of X:", mu)

X: [ 3  8 34 39 46 52 52 52 54 57 60 65 66 75 83 85 88 94 95 96]
Mean of X: 60.2


### Range

It's the difference between the max and min values in a dataset. It's super sensitive to outliers. We use `numpy`'s peak to peak function for this.

In [3]:
print("Range of X:", np.ptp(X))

Range of X: 93


### Mean Absolute Deviation (MAD)

The mean absolute deviation is the average of the distances of observations from the arithmetic mean. We use the absolute value of the deviation (so that 5 above and below both contribute 5) because otherwise it'll always sum to 0.

$$MAD = \frac{\sum_{i=1}^{n} \left| X_{i} - \mu \right|}{n}$$

Where $n$ is the number of observations and $\mu$ is their mean.

In [4]:
abs_dispersion = [np.abs(mu - x) for x in X]
MAD = np.sum(abs_dispersion)/len(abs_dispersion)
print("Mean Absolute Deviation of X:", MAD)

Mean Absolute Deviation of X: 20.520000000000003


### Variance & Standard Deviation

The variance $\sigma^{2}$ is defined as the average of the squared deviations around the mean:

$$\sigma^{2} = \frac{\sum_{i=1}^{n} (X_{i} - \mu)^{2}}{n}$$

This is sometimes more convenient than the mean absolute deviation because squaring is smooth and differentiable.

Standard deviation is defined as the square root of the variance $\sigma$, and it's the easier of the two to interpret because it's in the same units as the observations.

In [5]:
print("Variance of X:", np.var(X))
print("Standard deviation of X:", np.std(X))

Variance of X: 670.16
Standard deviation of X: 25.887448696231154


One way to interpret standard deviation is by referring to Chebyshev's inequality. This tells us that the proportion of samples within a distance of $k$ standard deviation of the mean is at least $1 - \frac{1}{k^{2}}$ for all $k > 1$.

Let's check that this is true for our data set.

In [6]:
k = 1.25
dist = k*np.std(X)
l = [x for x in X if abs(x - mu) <= dist]
print("Observations within", k, "stds of mean:", l)
print("Confirming that", float(len(l))/len(X), '>', 1 - 1/k**2)

Observations within 1.25 stds of mean: [34, 39, 46, 52, 52, 52, 54, 57, 60, 65, 66, 75, 83, 85, 88]
Confirming that 0.75 > 0.36


The bound given by Chebyshev's inequality seems fairly loose in this case. This bound is rarely strict, but it's useful because it holds for all data sets and distributions.

#### Semivariance & Semideviation

Although variance and standard deviation tell us how volatile a quantity is, they don't differentiate between upwards/downwards volatility. With returns on an asset, we're mostly worried about downwards deviation. This is addressed by semivariance and semideviation, which only count the observations that fall below the mean. It's defined as:

$$\frac{\sum_{X_{i}< \mu} (X_{i} - \mu)^{2}}{n_{<}}$$

Where $n_{<}$ is the number of observations which are smaller than the mean. Semideviation is the square root of the semivariance.

In [7]:
# because there's no built-in semideviation, we'll compute it ourselves
lows = [e for e in X if e <= mu]

semivar = np.sum((lows-mu)**2)/len(lows)

print("Semivariance of X:", semivar)
print("Semideviation of X:", np.sqrt(semivar))

Semivariance of X: 689.5127272727273
Semideviation of X: 26.258574357202395


A related notion is target semivariance (and target semideviation), where we average the distance from a target of values which fall below that target:

$$\frac{\sum_{X_{i}< B} (X_{i} - B)^{2}}{n_{< B}}$$

In [8]:
B = 19
lows_B = [e for e in X if e <= B]
semivar_B = sum(map(lambda x: (x-B)**2,lows_B))/len(lows_B)

print("Target semivariance of X:", semivar_B)
print("Target semideviation of X:", np.sqrt(semivar_B))

Target semivariance of X: 188.5
Target semideviation of X: 13.729530217745982


## These are Only Estimates

All of these computations will give you sample statistics, that is standard deviation of a sample of data. whether this reflects the current true population standard deviation is not always obvious, and more effort has to be put into determining that. this is especially problematic in finance because all data are time series and the mean and variance may change over time. there are many different techniques and subtleties here.

In general do not assume that because something is true of your sample, it will remain true going forward.