# DAML 05 - Stats Exercises - Solutions

Michal Grochmal <michal.grochmal@city.ac.uk>

Exercises rating:

★☆☆ - You should be able to based on Python knowledge plus lecture contents.

★★☆ - You will need to do extra thinking and some extra reading/searching.

★★★ - The answer is difficult to find by a simple search,
      requires you to do a considerable amount of extra work by yourself
      (feel free to ignore these exercises if you're short on time).

We implemented the mean function directly in `NumPy` broadcasting.
Let's try the same with the other basic statistic functions, using similar
vectors as we did in the lecture.

Note: It is fine to reuse previous solutions in later exercises.
It it *not* fine to use `NumPy`'s `mean`, `std`, `var`, `cov`, or `corrcoef`.

In [1]:
import numpy as np

arr = np.arange(30, 90, 2)
acr = np.arange(60, 120, 2) + np.random.rand(30)*3 - 1
arr, acr

(array([30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62,
        64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88]),
 array([ 60.96703967,  62.18125373,  64.4334965 ,  66.10791364,
         67.26541309,  69.57897034,  72.91379496,  74.51500873,
         75.59291143,  79.08318652,  80.36096987,  82.1903539 ,
         85.31076292,  86.42975623,  88.41459985,  91.03162943,
         91.12738769,  95.40597141,  96.09740929,  99.02464732,
         99.40094322, 103.78173011, 105.86424584, 106.96963744,
        109.46852711, 111.81404758, 111.74777638, 115.65662469,
        117.86512143, 118.85518312]))

### 1. Mean (already solved in lecture).

$$\bar{x} = \frac{1}{N} \sum_{i=1}^{N} x_i$$

In [2]:
def daml_mean(x):
    return x.sum() / len(x)


# test
print(arr.mean())
print(daml_mean(arr))

59.0
59.0


### 2. Variance (★☆☆)

$$\sigma^2 = \frac{1}{N - d} \sum_{i=1}^{N} (x_i - \bar{x})^2$$

In [3]:
def daml_var(x, ddof=0):
    return ((x - x.mean()) ** 2).sum() / (len(x) - ddof)


# test
print(arr.var(ddof=0))
print(arr.var(ddof=1))
print(daml_var(arr, 0))
print(daml_var(arr, 1))

299.6666666666667
310.0
299.6666666666667
310.0


### 3. Standard Deviation (★☆☆)

$$\sigma = \sqrt{\frac{1}{N - d} \sum_{i=1}^{N} (x_i - \bar{x})^2}$$

In [4]:
def daml_std(x, ddof=0):
    return np.sqrt(daml_var(x, ddof))


def daml_full_std(x, ddof=0):
    return np.sqrt(((x - x.mean()) ** 2).sum() / (len(x) - ddof))


# test
print(arr.std(ddof=0))
print(arr.std(ddof=1))
print(daml_std(arr, 0))
print(daml_std(arr, 1))
print(daml_full_std(arr, 0))
print(daml_full_std(arr, 1))

17.31088289679838
17.60681686165901
17.31088289679838
17.60681686165901
17.31088289679838
17.60681686165901


### 4. Covariance (★☆☆)

$$cov(X, Y) = \sigma_{xy} = \frac{1}{N - d} \sum_{i=1}^{N} (x_i - \bar{x})(y_i - \bar{y})$$

Note: You only need to calculate the covariance *between* the arrays, and only between two arrays.
No need to calculate the diagonal of the covariance matrix.

In [5]:
def daml_cov(x, y, ddof=0):
    return ((x - x.mean()) * (y - y.mean())).sum() / (len(x) - ddof)


# test
print(np.cov([arr, acr], ddof=0)[0, 1])
print(np.cov([arr, acr], ddof=1)[0, 1])
print(daml_cov(arr, acr, 0))
print(daml_cov(arr, acr, 1))

305.9194944930895
316.4684425790581
305.9194944930895
316.4684425790581


### 5. Correlation (★★☆)

$$corr(X, Y) = r = \frac{cov(X, Y)}{\sigma_x \sigma_y} = \frac{\sigma_{xy}}{\sigma_x \sigma_y}$$

Note: You only need to implement the correlation coefficient between two arrays.
No need for the entire matrix of the `p-values`.  Also, degrees of freedom are
meaningless for correlation (the $1/(N - ddof)$ is simplified in the equation).

In [6]:
def daml_corr(x, y):
    ddof = len(x) - 1
    return daml_cov(x, y, ddof) / (daml_std(x, ddof) * daml_std(y, ddof))


def daml_full_corr(x, y):
    return ( (((x - x.mean()) * (y - y.mean())).sum() / len(x))
            / (  (np.sqrt(((x - x.mean()) ** 2).sum() / len(x)))
               * (np.sqrt(((y - y.mean()) ** 2).sum() / len(x)))
           ))

# test
print(np.corrcoef([arr, acr])[0, 1])
print(daml_corr(arr, acr))
print(daml_full_corr(arr, acr))

0.9991840136269693
0.9991840136269692
0.9991840136269692
