# Minister's 2019 mathematics

In [None]:
# Don't change this cell; just run it.
import numpy as np  # The array library.
import pandas as pd
# Safe settings for Pandas.
pd.set_option('mode.chained_assignment', 'raise')

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

# The OKpy testing system.
from client.api.notebook import Notebook
ok = Notebook('ministers_math.ok')

Recall the [education minister's problem](https://uob-ds.github.io/cfd2021/confidence/havana_math).

The Cuban education minister is worried that mathematics teaching has become
less effective, in 2019, due to the loss of teachers to emigration.

[Ena Elsa Velázquez
Cobiella](https://www.theewf.org/speakers/view/ena-elsa-vel%C3%A1zquez-cobiella)
has been the education minister in Cuba since 2008.

She knows the distribution of all 7300 or so of the 2018 school year marks
from the final mathematics exams from Havana schools.

In [None]:
havana_2018 = pd.read_csv('havana_math_2018.csv')
# Drop missing marks.
marks_2018 = havana_2018['mark'].dropna()
# Plot
marks_2018.hist(bins=100);

This gives her, among many other summaries, the 2018 mean:

In [None]:
mean_2018 = marks_2018.mean()
mean_2018

She wants to get an early indication if the marks have dropped for 2019, so
she has got a random sample of 50 of the 2019 scripts, and had them marked
urgently. This gives her the following random sample of the 2019 marks:

In [None]:
sample_2019 = pd.read_csv('havana_math_2019_sample.csv')
sample_marks = sample_2019['mark']
sample_marks.hist(bins=100);
sample_marks.head()

In [None]:
n_sample = len(sample_marks)
print("Number of scripts in sample", n_sample)
mean_sample = sample_marks.mean()
print("Mean of sample marks", mean_sample)

This is the mean for the *sample*, but her main interest is in the mean fro
the *population*.  The population in this case is the eventual 7000 or so
marks from all Havana schools.

So, her interest is in using the *sample* to get an idea of what the
*population* mean will be.  She can't tell precisely what the mean will be,
from the sample, but she plans to use the bootstrap to give her a reasonable
*range* (an interval) for the eventual population mean.  She decides she wants
the interval to give her an 80% chance of capturing the eventual population
mean.

As you remember, her procedure will be:

* Take many *bootstrap samples* from this sample.
* Build up a bootstrap sampling distribution of the mean.
* Use percentiles from this distribution to give the interval.

As you remember a *bootstrap sample* is a sample that is the same size as the original sample, sampled randomly from the original sample *with replacement*.

As usual, we start with a cell to implement a single *trial* — a single bootstrap sample. Store the sample as `boot_sample`.  Then take the mean and store as `boot_mean`.

In [None]:
first_boot_sample = ...
first_boot_mean = ...
first_boot_mean

In [None]:
_ = ok.grade('q_first_boot_sample')

A histogram of you new bootstrap sample:

In [None]:
plt.hist(first_boot_sample)
plt.title("First bootstrap sample");

Now we build up the bootstrap sampling distribution:

In [None]:
# Build bootstrap sampling distribution
n_replications = 10000
boot_means = np.zeros(n_replications)
...
# Plot the bootstrap sampling distribution
plt.hist(boot_means, bins=100);

In [None]:
_ = ok.grade('q_boot_means')

Calculate the left and right intervals to capture the population mean 80% of the time.

In [None]:
...
print("Left", left)
print("Right", right)

In [None]:
_ = ok.grade('q_left_right')

**For reflection**.  What do you think about this interval?  How confident are
you that the marks distribution center (the mean) is different in 2019
compared to 2018?

## Done.

Congratulations, you're done with the assignment!  Be sure to:

- **run all the tests** (the next cell has a shortcut for that).
- **Save and Checkpoint** from the `File` menu.

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [ok.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]