# Coding exercises
Exercises 1-3 are thought exercises that don't require coding. If you need a Python crash-course/refresher, work through the [`python_101.ipynb`](./python_101.ipynb) notebook in chapter 1.

## Exercise 4: Generate the data by running this cell
This will give you a list of numbers to work with in the remaining exercises.

In [102]:
import random
import statistics

random.seed(0)
salaries = list(round(random.random()*1000000, -3) for _ in range(100))

def check_equality(value1, value2):
    return value1 == value2


## Exercise 5: Calculating statistics and verifying
### mean

In [103]:
'''Using custom functions'''

def calc_mean(sequence):
    return sum(sequence) / len(sequence)

customMean = calc_mean(salaries)

'''Using the statistics library'''

mean = statistics.mean(salaries)

'''Testing equality between the last 2 values'''

print(check_equality(customMean, mean))

True


### median

In [104]:
'''Using custom functions'''
from math import floor, ceil

def is_even(x):
    return x % 2 == 0

def calc_median(sequence):
    medianIndex = (len(sequence) + 1)/2 - 1
    if is_even(len(sequence)):
        return (sequence[floor(medianIndex)] + sequence[ceil(medianIndex)])/2
    return sequence[floor(medianIndex)]

customMedian = calc_median(sorted(salaries))

'''Using the statistics library'''

median = statistics.median(sorted(salaries))

'''Testing equality between the last 2 values'''

print(check_equality(customMedian, median))


True


### mode

In [105]:
'''Using custom functions'''

from collections import Counter

def calc_mode(sequence):
    return Counter(sequence).most_common(1)[0][0]

customMode = calc_mode(salaries)

'''Using the statistics library'''

mode = statistics.mode(salaries)

'''Testing equality between the last 2 values'''

print(check_equality(customMode, mode))


True


### sample variance
Remember to use Bessel's correction.

In [106]:
'''Using custom functions'''

def calc_variance(sequence):
    mean = calc_mean(sequence)
    numerator = sum(
        (item - mean)**2 
        for item in sequence
        )
    denominator = len(sequence) - 1
    return numerator / denominator

customVariance = calc_variance(salaries)

'''Using the statistics library'''

variance = statistics.variance(salaries)

'''Testing equality between the last 2 values'''

print(check_equality(customVariance, variance))


True


### sample standard deviation
Remember to use Bessel's correction.

In [107]:
'''Using custom functions'''

import math

def calc_stdev(sequence):
    return math.sqrt(calc_variance(sequence))

customStdev = calc_stdev(salaries)

'''Using the statistics library'''

stdev = statistics.stdev(salaries)

'''Testing equality between the last 2 values'''

print(check_equality(customStdev, stdev))

True


## Exercise 6: Calculating more statistics
### range

In [108]:
def calc_range(sequence):
    return max(sequence) - min(sequence)

range_ = calc_range(salaries)
range_


995000.0

### coefficient of variation
Make sure to use the sample standard deviation.

In [109]:
def calc_coefficient_of_variation(mean, stdev):
    return stdev / mean

CV = calc_coefficient_of_variation(customMean, customStdev)
CV

0.45386998894439035

### interquartile range

In [110]:
def calc_quantile(sequence, percentage):
    index = (len(sequence) + 1)*percentage - 1
    if is_even(len(sequence)):
        return (sequence[floor(index)] + sequence[ceil(index)]) / 2
    return sequence[floor(index)]

def calc_iqr(sequence):
    return calc_quantile(sequence, 0.75) - calc_quantile(sequence, 0.25)

IQR = calc_iqr(sorted(salaries))
IQR


417500.0

### quartile coefficent of dispersion

In [111]:
def calc_qcd(sequence):
    num = calc_quantile(sequence, 0.75) - calc_quantile(sequence, 0.25)
    den = calc_quantile(sequence, 0.75) + calc_quantile(sequence, 0.25)
    return num / den

QCD = calc_qcd(sorted(salaries))
QCD

0.3417928776094965

## Exercise 7: Scaling data
### min-max scaling

In [112]:
def min_max_scale(sequence):
    return list(
        (item  - min(sequence))/calc_range(sequence) 
        for item in sequence
        )

normalizedSalaries = min_max_scale(salaries)

### standardizing

In [113]:
def standarize(sequence):
    mean = calc_mean(sequence)
    stdev = calc_stdev(sequence)
    return list(
        (item  - mean)/stdev 
        for item in sequence
        )

standarizedSalaries = standarize(salaries)

## Exercise 8: Calculating covariance and correlation
### covariance

In [122]:
def calc_covariance(sequence1, sequence2):
    expectedValues = list(
        (x - calc_mean(sequence1))*(y - calc_mean(sequence2))
        for x, y in zip(sequence1, sequence2)
    )
    return calc_mean(expectedValues)

COV = calc_covariance(normalizedSalaries, standarizedSalaries)
COV

0.2644912991825042

### Pearson correlation coefficient ($\rho$)

In [121]:
def calc_pearson_correlation(sequence1, sequence2):
    num = calc_covariance(sequence1, sequence2)
    den = calc_stdev(sequence1) * calc_stdev(sequence2)
    return num / den

RHO = calc_pearson_correlation(normalizedSalaries, standarizedSalaries)
RHO  

0.9900000000000004

<hr>
<div style="overflow: hidden; margin-bottom: 10px;">
    <div style="float: left;">
        <a href="./python_101.ipynb">
            <button>Python 101</button>
        </a>
    </div>
    <div style="float: right;">
        <a href="../../solutions/ch_01/solutions.ipynb">
            <button>Solutions</button>
        </a>
        <a href="../ch_02/1-pandas_data_structures.ipynb">
            <button>Chapter 2 &#8594;</button>
        </a>
    </div>
</div>
<hr>