# Coding exercises
Exercises 1-3 are thought exercises that don't require coding. If you need a Python crash-course/refresher, work through the [`python_101.ipynb`](./python_101.ipynb) notebook in chapter 1.

## Exercise 4: Generate the data by running this cell
This will give you a list of numbers to work with in the remaining exercises.

In [1]:
import random

random.seed(0)
salaries = [round(random.random()*1000000, -3) for _ in range(100)]

## Exercise 5: Calculating statistics and verifying
### mean

In [2]:
total_salary = sum(salaries)
length = len(salaries)
mean_salary = total_salary / length
mean_salary

from statistics import mean

mean_salary == mean(salaries)

True

### median

In [3]:
import math

def find_median(x):
    x.sort()
    mid_point = (len(x) + 1) / 2 - 1
    if len(x) % 2:
        return x[int(mid_point)]
    else:
        return (x[math.floor(mid_point)] + x[math.ceil(mid_point)]) / 2

In [4]:
median_salary = find_median(salaries)

from statistics import median

median_salary == median(salaries)

True

### mode

In [5]:
from statistics import mode
from collections import Counter

mode_salary = Counter(salaries).most_common(1)[0][0]

mode_salary == mode(salaries)

True

### sample variance
Remember to use Bessel's correction.

In [6]:
sample_variance = sum([(x - mean_salary)**2 for x in salaries]) / (len(salaries) - 1)

from statistics import variance

sample_variance == variance(salaries)

True

### sample standard deviation
Remember to use Bessel's correction.

In [7]:
sample_std_dev = math.sqrt(sample_variance)

from statistics import stdev

sample_std_dev == stdev(salaries)

True

## Exercise 6: Calculating more statistics
### range

In [8]:
salary_range = max(salaries) - min(salaries)
salary_range

995000.0

### coefficient of variation
Make sure to use the sample standard deviation.

In [9]:
coefficient_of_variation = sample_std_dev / mean_salary
coefficient_of_variation

0.45386998894439035

### interquartile range

In [10]:
import math

def quantile(x, pct):
    x.sort()
    index = (len(x) + 1) * pct - 1
    if len(x) % 2:
        return x[int(index)]
    else:
        return (x[math.floor(index)] + x[math.ceil(index)]) / 2

In [11]:
q3, q1 = quantile(salaries, 0.75), quantile(salaries, 0.25)
iqr = q3 - q1
iqr

417500.0

### quartile coefficent of dispersion

In [12]:
quartile_coefficient_of_dispersion = iqr / (q1 + q3)
quartile_coefficient_of_dispersion

0.3417928776094965

## Exercise 7: Scaling data
### min-max scaling

In [13]:
minimum_salary = min(salaries)
min_max_scale = [(x - minimum_salary) / salary_range for x in salaries]   
min_max_scale[:5]

[0.0,
 0.01306532663316583,
 0.07939698492462312,
 0.0814070351758794,
 0.08944723618090453]

### standardizing

In [14]:
standardized = [(x - mean_salary) / sample_std_dev for x in salaries]
standardized[:5]

[-2.199512275430514,
 -2.150608309943509,
 -1.9023266390094862,
 -1.8948029520114855,
 -1.8647082040194827]

## Exercise 8: Calculating covariance and correlation
### covariance

In [15]:
import numpy as np
np.cov(min_max_scale, standardized)

array([[0.07137603, 0.26716293],
       [0.26716293, 1.        ]])

In [16]:
from statistics import mean

running_total = [
    (x - mean(min_max_scale)) * (y - mean(standardized))
    for x, y in zip(min_max_scale, standardized)
]

cov = mean(running_total)
cov

0.26449129918250414

### Pearson correlation coefficient ($\rho$)

In [17]:
from statistics import stdev

p = cov / (stdev(min_max_scale) * stdev(standardized))
p

0.9900000000000001

<hr>
<div style="overflow: hidden; margin-bottom: 10px;">
    <div style="float: left;">
        <a href="./python_101.ipynb">
            <button>Python 101</button>
        </a>
    </div>
    <div style="float: right;">
        <a href="../../solutions/ch_01/solutions.ipynb">
            <button>Solutions</button>
        </a>
        <a href="../ch_02/1-pandas_data_structures.ipynb">
            <button>Chapter 2 &#8594;</button>
        </a>
    </div>
</div>
<hr>