## Exercises on the three-girl simulation problem

These exercises follow on from the three girls solution in [filling arrays](https://matthew-brett.github.io/cfd2020/arrays/filling_arrays), and the [reply to the Supreme
Court](https://matthew-brett.github.io/cfd2020/iteration/reply_supreme).

You may want to refer back to those pages for inspiration.

In [None]:
# Don't change this cell; just run it.
import numpy as np  # The array library.

# The OKpy testing system.
from client.api.notebook import Notebook
ok = Notebook('three_girls.ok')

### Three girls in a family of four

Remember the solution to this problem, from [filling
arrays](https://matthew-brett.github.io/cfd2020/arrays/filling_arrays)?  Here is a version of that
solution:

As usual, we first build a single trial, to make sure we are on the right track.

In [None]:
# A single trial is one family (of four children)
family = np.random.randint(0, 2, 4)
count = np.sum(family)
# The count should be between 0 and 4 inclusive.
count

Here is the full solution, with the trial logic inside the for-loop.

In [None]:
# Reset the counts array to empty (all zeros)
counts_girls = np.zeros(10000)

# Repeat the indented stuff 10000 times.
for i in np.arange(10000):
    # The procedure for one family.
    family = np.random.randint(0, 2, 4)
    count = np.sum(family)
    # Fill in the corresponding value in the counts_girls array.
    counts_girls[i] = count

# Proportion
p_3_girls = np.count_nonzero(counts_girls == 3) / 10000
p_3_girls

### No girls in a family of four

Estimate the chances that a 4-child family will have no girls.  You can copy
the code from the cell above, and modify it, or you may be able to use
variables from the code above, to get the answer, without repeating the
simulation.

In [None]:
#- Calculate proportion with 0 girls.
p_no_girls = ...
p_no_girls

In [None]:
_ = ok.grade('q_1_no_girls')

For extra points - the answer above is easier to work out with probability
than the chance of three girls.  What's the exact answer, from probability?

Edit this code cell to give your answer.


### Three girls in a family of five

Simulate the chances that a family with 5 children will have
exactly 3 girls.

In [None]:
#- Logic for a single trial.
#- A single trial is one family (of five children)
family = np.random.randint(...)
count = np.sum(family)
# The count should be between 0 and 5 inclusive.
count

Now adapt the for-loop a few cells above, adding your single-trial logic, just
above, to simulate the chances of 3 girls from 5 children.

In [None]:
#- Simulate 10000 families of 5 children.
#- Show proportion with 3 girls.
#- Your code below
p_3_of_5 = ...
p_3_of_5

In [None]:
_ = ok.grade('q_2_three_of_five')

### Three or fewer girls in a family of five

Estimate the chances that a family with 5 children will have 3 or fewer girls, using your simulation.

Hint: you may remember from the [Comparison page](https://matthew-brett.github.io/cfd2020/data-types/Comparison) that `<=` tests whether the thing on the left is
*less than or equal to* the thing on the right.

In [None]:
3 <= 4

In [None]:
3 <= 3

In [None]:
3 <= 2

In [None]:
my_array = np.array([1, 2, 3, 4])
my_array <= 2

Hints: you have already done a simulation above that you can use to answer
this question.  Use the results from that simulation below.

In [None]:
#- Use your simulation of 10000 families, each of 5 children.
#- Show the proportion of families of 5 children with 3 or fewer girls.
#- Your code below
p_3_or_fewer = ...
p_3_or_fewer

In [None]:
_ = ok.grade('q_3_three_or_fewer')

### More realistic simulation

Now we are back to the situation of exactly 3 girls in a family of 4.

In fact, when you have a child, the probability of having a girl is slightly
less than 0.5.

The [proportion of boys born in the
UK](https://www.gov.uk/government/statistics/gender-ratios-at-birth-in-great-britain-2010-to-2014)
is 0.513.  Hence the proportion of girls is 1-0.513 = 0.487.

With that probability of having a girl, what are the chances of having exactly
three girls in a family of four?

Hint 1: you may want to use `np.random.uniform`.  Check the help with
`np.random.uniform?` followed by Shift-Enter in a new cell.  It works like
this:

In [None]:
# An array of four random numbers between 0 and 1.
np.random.uniform(0, 1, 4)

Hint 2: Let's say I have a random number `x` between 0 and 1:

In [None]:
x = np.random.uniform(0, 1)
x

The probability that that some returned `x` will be less than - say - 0.25, is 0.25.

Here's a variable with the chance of any given child being a girl. Run this cell:

In [None]:
# Run this cell.
chance_of_girl = 0.487

In [None]:
#- Cell for the logic of one trial.
#- One trial is a family of four children.
family = np.random...
#- Make a Boolean array identifying girls (True) and boys (False)
are_girls = ...
#- Count the number of Trues you got.
count = ...
# The new count should be between 0 and 4 inclusive, but now there should be a
# very slightly reduced chance of higher numbers.
count

Now put this logic into your for-loop, to simulate the numbers with the modified chance of a girl.

In [None]:
#- Estimate more accurate chance of having exactly 3 girls
#- in a family of four.
p_r3_of_4 = ...
p_r3_of_4

In [None]:
# It is possible, but highly unlikely, that this test will fail even
# when you have the right calculation, because your answer is a little
# random, and it may be very unusual.  If the test fails, and you think
# your answer is right, run it again a few times.
_ = ok.grade('q_4_r_three_of_four')

## Done.

Congratulations, you're done with the assignment!  Be sure to:

- **Run all the tests** (the next cell has a shortcut for that).
- **Save and Checkpoint** from the `File` menu.

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [ok.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]