# Descriptive Statistics: Measures of Dispersion (Absolute)

## Standard Deviation

This is usually used as a way to identify outliers. Data points that lie more than
one standard deviation from the mean can be considered unusual.

The standard deviation is the square root of the variance.

### Formula (standard deviation)

$$ \sigma = \sqrt{\frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}} $$

$$
s^2 = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}}
$$

### Examples (standard deviation)

In [24]:
import numpy as np
import pandas as pd
from scipy import stats
import statistics as sts

#### Example 1 (standard deviation)

In [2]:
day = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
time= [39, 29, 43, 52, 39, 44, 40, 31, 44, 35]

In [3]:
df = pd.DataFrame(columns=["Day", "Time"])
df["Day"] = day
df["Time"] = time
df

Unnamed: 0,Day,Time
0,1,39
1,2,29
2,3,43
3,4,52
4,5,39
5,6,44
6,7,40
7,8,31
8,9,44
9,10,35


In [4]:
df.describe()

Unnamed: 0,Day,Time
count,10.0,10.0
mean,5.5,39.6
std,3.02765,6.769211
min,1.0,29.0
25%,3.25,36.0
50%,5.5,39.5
75%,7.75,43.75
max,10.0,52.0


In [5]:
np.var(time)

np.float64(41.24)

In [6]:
sts.pvariance(time)

41.24

In [7]:
sts.variance(time)

45.82222222222222

In [8]:
float(f"{stats.tstd(time):.2f}")

6.77

#### Example 2 (standard deviation)

In [9]:
hurricanes = [18, 21, 13, 19, 24, 17, 14, 12, 15, 14]

In [10]:
float(f"{sts.pstdev(hurricanes):.2f}") # population standard deviation

3.63

In [11]:
float(f"{np.std(hurricanes):.2f}") # population standard deviation

3.63

In [12]:
float(f"{stats.tstd(hurricanes):.2f}") # sample standard deviation

3.83

In [13]:
float(f"{sts.stdev(hurricanes):.2f}")  # sample standard deviation

3.83

#### Example 3 (standard deviation)

In [14]:
students = [18, 22, 25, 26, 15]

In [15]:
stud_mean = np.mean(students)
stud_sub = np.subtract(students, stud_mean)
stud_var = np.sum(np.square(stud_sub)) / 5
np.sqrt(stud_var)

np.float64(4.166533331199932)

In [16]:
sts.pstdev(students)

4.166533331199932

In [17]:
np.std(students)

np.float64(4.166533331199932)

### Extra: Coefficient of Variation

#### Formula (coefficient of variation)

$$ CV = \frac{\sigma}{\mu} $$

##### Estimation

When only a sample of data from a population is available, the population CV can be estimated using the ratio of the sample standard deviation $s$ to the sample mean $\bar{x}$:
$$
\widehat{c_{\rm{v}}}={\frac {s}{\bar{x}}}
$$

Use the `variation()` function from `scipy` to calculate the coefficient of variation.
This function is equivalent to:
```Python
np.std(x, axis=axis, ddof=ddof) / np.mean(x)
```
The `ddof` parameter is the Delta Degrees of Freedom.

#### Example 1 (coefficient of variation)

In [19]:
hurricanes = [18, 21, 13, 19, 24, 17, 14, 12, 15, 14]
hurr_mean = np.mean(hurricanes)
hurr_sub = np.subtract(hurricanes, hurr_mean)
hurr_var = np.sum(np.square(hurr_sub)) / 10
hurr_std = np.sqrt(hurr_var)
hurr_cv = hurr_std / hurr_mean
hurr_cv


np.float64(0.21763810593276947)

In [20]:
hurr_cv * 100

np.float64(21.763810593276947)

#### Example 2 (coefficient of variation)

In [21]:
students = [18, 22, 25, 26, 15]

In [22]:
stud_mean = np.mean(students)
stud_sub = np.subtract(students, stud_mean)
stud_variance = np.sum(np.square(stud_sub)) / 5
stud_std = np.sqrt(stud_var)
stud_cv = stud_std / stud_mean
stud_cv

np.float64(0.1965345910943364)

In [23]:
stud_cv * 100

np.float64(19.65345910943364)