In [1]:
#: the usual suspects
import babypandas as bpd
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

# Probability and Simulation

### Swain vs. Alabama, 1965
* Talladega County, Alabama
* Robert Swain, black man convicted of crime
* Appeal: one factor was all-white jury
* Only men 21 years or older were allowed to serve
* 26% of this population were black
* Swain’s jury panel consisted of 100 men
* 8 men on the panel were black


### Supreme Court Ruling

* About disparities between the percentages in the eligible population and the jury panel, the Supreme Court wrote:

> "... the overall percentage disparity has been small and reflects no studied attempt to include or exclude a specified number of Negroes”

* The Supreme Court denied Robert Swain’s appeal
* What's the probability that the disparity was due to chance?
* If it's small, something is up.

## Probability Theory

- Some things in life *seem* random. E.g., coin flip.
- The *probability* of seeing "Heads" is 50%.
- Probability: if we flipped coin infinitely many times, the 50% of outcomes would be heads.

## Terminology

- **Experiment**: The thing whose result is random.
    - e.g., rolling a die.
    - e.g., flipping a coin twice.
- **Outcome**: The result of an experiment.
    - e.g., the possible outcomes of rolling a 6-sided die are 1, 2, 3, 4, 5, 6
    - e.g., the possible outcomes of flipping a coin twice are HH, HT, TH, TT
- **Event**: A set of outcomes.
    - e.g., the event that the die lands on a even number is the collection of outcomes {2, 4, 6}.
    - e.g., the event that there was at least one head in two flips: {HH, HT, TH}

## Terminology

- **Probability**: A number between 0 and 1 which describes likelihood of event.
    - 1 if that event always happens
    - 0 if that event never happens
    - Notation: if $X$ is an event, $P(X)$ is the probability of the event.

## Equally-likely outcomes

- If all outcomes are equally likely, computing probabilities is done by counting:

$$
P(A) = \frac{
    \text{# of outcomes that make $A$ happen}
}{
    \text{total # of outcomes}
}
$$

## Discussion question

I have three cards: red, blue, and green. What is the chance that I choose a card at random, and it is green, then -- without putting it back -- I choose another card at random and it is red?

- A) 1/9
- B) 1/6
- C) 1/3
- D) 2/3
- E) None of the above.

## Discussion question solved

- The possible outcomes are: RG, RB, GR, GB, BR, BG.
- The outcomes are equally-likely.
- There is only one outcome which makes the event happen: GR.
- Hence the probability is $1/6$.

## Conditional probabilities

- Two events $A$ and $B$ can both happen.
    - e.g.: $A$ is event "roll is 3 or less", $B$ is event "roll is even"
- Suppose that we know $A$ has happened, but we don't know if $B$ has.
- The conditional probability of $B$ given $A$ is:

$$
P(B \text{ given } A)
= \frac{
    \text{# of outcomes satisfying both $A$ and $B$}
}{
    \text{# of outcomes satisfying $A$}
}
$$

## Discussion question

$$
P(B \text{ given } A)
= \frac{
    \text{# of outcomes satisfying both $A$ and $B$}
}{
    \text{# of outcomes satisfying $A$}
}
$$

I roll a six-sided die and don't tell you what the result is, but I tell you that it is less than or equal to three. What is the probability that the result is even?

- A) 1/2
- B) 1/3
- C) 1/4
- D) None of the above.

## Discussion problem solved

$$
P(B \text{ given } A)
= \frac{
    \text{# of outcomes satisfying both $A$ and $B$}
}{
    \text{# of outcomes satisfying $A$}
}
$$

- There are three outcomes where the roll is three or less: 1, 2, 3
- There are is only one outcome where both $A$ and $B$ happen: 2
- So $P(B \text{ given } A) = 1/ 3$

## Probability that two events both happen

$$
P(A \text{ and } B) = \frac{
    \text{# of outcomes satisfying $A$ and $B$ both}
}{
    \text{total # of outcomes}
}
$$

What is the probability that the roll is even and $\leq$ three?

- Only 1 outcome satisfies: rolling a two.
- Six total outcomes.
- Probability is $1/6$.

## Probability that two events both happen: equivalent formula

$$
    P(A \text{ and } B)
    =
    P(A \text{ given } B) \cdot P(B)
$$

What is the probability that the roll is even and $\leq$ three?

- We saw probability of even given $\leq$ three: 1/3
- Probability of $\leq 1/3$ is 1/2.
- Probability is $1/3 \cdot 1/2 = 1/6$.

## What if $B$ isn't affected by $A$?

- We have found that $P(A \text{ and } B) = P(A \text{ given } B)\cdot P(B)$.
- Sometimes $P(A \text{ given } B) = P(A)$. Then $P(A \text{ and } B) = P(A) \cdot P(B)$
- Example: Suppose we flip a fair coin three times.
    - The probability that the second flip is heads doesn't depend on the result of the first flip.
- What is the probability of getting tails three times in a row?
    - $1/2 \cdot 1/2 \cdot 1/2 = 1/8$

## Probability of either of two events happening

$$
P(A \text{ or } B) = \frac{
    \text{# of outcomes satisfying either $A$ or $B$}
    }{
    \text{total # of outcomes}
    }
$$

## Mutual exclusivity

- Suppose that if $A$ happens, then $B$ doesn't, and if $B$ happens then $A$ doesn't.
- Then the # of outcomes satisfying either A or B is just:
$$
    (\text{# of outcomes satisfying $A$})
    +
    (\text{# of outcomes satisfying $B$})
$$
- So **if** $A$ and $B$ are mutually exclusive:

$$
\begin{align*}
    P(A \text{ or } B) 
    &= \frac{
        \text{# of outcomes satisfying either $A$ or $B$}
        }{
        \text{total # of outcomes}
        }
        \\[1em]
    &= \frac{
            (\text{# of outcomes satisfying $A$})
            +
            (\text{# of outcomes satisfying $B$})
        }{
        \text{total # of outcomes}
        }
        \\[1em]
    &= \frac{
            (\text{# of outcomes satisfying $A$})
        }{
        \text{total # of outcomes}
        }
        +
        \frac{
            (\text{# of outcomes satisfying $B$})
        }{
        \text{total # of outcomes}
        }
    \\[1em]
    &= P(A) + P(B)
\end{align*}
$$

## Probability that an event *doesn't* happen

- The probability that $A$ doesn't happen is just $1 - P(A)$.
- Example:
    - If the probability of a sunny day is 0.85, then the probability of a non-sunny day is 0.15.

## Discussion question

Every time I call my grandma, the probability that she answers her phone is 1/3. If I call my grandma three times today, what is the chance that I will talk to her at least once?

- A) 1/3
- B) 2/3
- C) 1/2
- D) 1
- E) None of the above.

## Discussion question solved

- We calculate the probability that she doesn't answer her phone in three tries.
- $2/3 \cdot 2/3 \cdot 2/3 = 8/27$.
- But we want the probability of her answering *at least* once. So we subtract this from one.
- $1 - 8/27 = 19/27$; none of the above!

# Simulation

## Finding probabilities with computers

## Simulation

- What is the probability of getting 60 or more heads if I flip 100 coins?
- Approximation through simulation:
    1. Figure out how to do one experiment (i.e., flip 100 coins).
    2. Run the experiment a bunch of times.
    3. Find the fraction of times where number of heads >= 60.

## Making a random choice (e.g., flipping a coin)

- `np.random.choice(options)`
- Return a random element.

In [2]:
# simulate a coin flip
np.random.choice(['Heads', 'Tails'])

## Making multiple random choices

- `np.random.choice(options, n)`

In [3]:
#: simulate 10 coin flips
np.random.choice(['Heads', 'Tails'], 10)

## Replacement vs. without replacement

- By default, this selects *with* replacement.
- That is, after making selection, that option is still available.
- If an option can only be selected once, select *without* replacement.

In [4]:
#: make a random team, without replacement
people = ['Winona', 'Xanthippe', 'Yvonne', 'Zelda']
np.random.choice(people, 3, replace=False)

# Simulation

## Flipping coins

- What is the probability of getting 60 or more heads if I flip 100 coins?
- Approximation through simulation:
    1. Figure out how to do one experiment (i.e., flip 100 coins).
    2. Run the experiment a bunch of times.
    3. Find the fraction of times where number of heads >= 60.

## Running the experiment once...

- Use `np.random.choice` to flip 100 coins
- Use `np.count_nonzero` to count number of heads.
    - Counts number of entries which are `True`.

In [5]:
coins = np.random.choice(['Heads', 'Tails'], 100)
coins

In [6]:
coins == 'Heads'

## Put it into a function

Make it easier to run the experiment again.

In [7]:
def coin_experiment():
    coins = np.random.choice(['Heads', 'Tails'], 100)
    return np.count_nonzero(coins == 'Heads')

In [8]:
coin_experiment()

## Repeating the experiment

- We can repeat this process many times by using a `for`-loop
- Need to store the results in an array... use `np.append`!

In [9]:
# make head_counts array
n_repetitions = 10000

head_counts = np.array([])

for i in np.arange(n_repetitions):
    head_count = coin_experiment()
    head_counts = np.append(head_counts, head_count)

In [10]:
# in how many trials was the number of heads >= 60?
at_least_60 = np.count_nonzero(head_counts >= 60)
at_least_60

In [11]:
# what is this as a proportion?
at_least_60 / n_repetitions

## Visualizing the distribution

In [12]:
#: visualize distribution of trial results
bpd.DataFrame().assign(
    Number_of_Heads=head_counts
).plot(kind='hist', bins=np.arange(30.5,70), density=True)
# plt.axvline(60, color='C1')

## The "Monty Hall" Problem

<img src="data/monty_1.svg" width=75% />

<img src="data/monty_2.svg" width=75% />

<img src="data/monty_3.svg" width=75% />

## Discussion question

- You originally selected door #2.
- The host reveals door #3 to have a goat behind it.
- You should:

    - A) keep with door number #2; it has just as high a chance of winning as door #1.
    - B) switch to door number #1; it has a higher chance of winning than door #2.

## Let's see

- We'll compute:
    - probability of winning if we switch.
    - probability of winning if we stay.
        - it's just 1 - (probability of winning if we switch)
- Whichever strategy has higher probability of winning is best.

# Simulate

- *Simulate* the Monty Hall problem many times to *estimate* probability.

    1. Figure out how to simulate one game of Monty Hall.
    2. Play a bunch of games.
    3. Count the proportion of wins for each strategy (stay or switch).

## 1) Simulate a single game

When a contestant picks their door, there are three equally-likely outcomes:

1. Goat #1
2. Goat #2
3. Car

In [13]:
behind_picked_door = np.random.choice(['Car', 'Goat 1', 'Goat 2'])
behind_picked_door

## 1) Simulate a single game

Suppose we can see what is behind their door (but the contestant can't).

- If it is a car, they will win if they stay.
- If it is a goat, they will win if they switch.

## 1) Simulate a single game


In [14]:
#- determine winning_strategy ('Stay' or 'Switch') based on what is behind_picked_door
if behind_picked_door == 'Car':
    winning_strategy = 'Stay'
else:
    # a goat was behind the picked door.
    # Monty will reveal the other goat. 
    # Switching wins:
    winning_strategy = 'Switch'

## 1) Simulate a single game

Turn it into a function to make it easier to repeat:

In [15]:
def simulate_monty_hall():
    behind_picked_door = np.random.choice(['Car', 'Goat 1', 'Goat 2'])
    
    if behind_picked_door == 'Car':
        winning_strategy = 'Stay'
    else:
        winning_strategy = 'Switch'
        
    print(behind_picked_door, 'was behind the door. Winning strategy:', winning_strategy)
    return winning_strategy

In [16]:
simulate_monty_hall()

## 2) Play a bunch of times

In [17]:
n_repetitions = 100

for i in np.arange(n_repetitions):
    simulate_monty_hall()

## 2) Play a bunch of times

We should save the winning strategies. Use `np.append`:

In [18]:
#: many simulations

n_repetitions = 10000

winning_strategies = np.array([])
for i in np.arange(n_repetitions):
    winning_strategy = simulate_monty_hall()
    winning_strategies = np.append(winning_strategies, winning_strategy)

## 3) Count the proportion of wins for each strategy (stay or switch).

In [19]:
winning_strategies

In [20]:
np.count_nonzero(winning_strategies == 'Switch')

In [21]:
np.count_nonzero(winning_strategies == 'Switch') / n_repetitions

## Marilyn vos Savant's column


<div style="display: flex; margin-top: .5in">
<div style="width: 45%;">
    <ul>
        <li>vos Savant asked the question in <i>Parade</i> magazine.</li>
        <li>She stated the correct answer: <i>switch</i>.</li>
        <li>Received over 10,000 letters in disagreement.</li>
        <li>Over 1,000 letters from people with Ph.D.s</li>
    </ul>
</div>
<div style="width: 50%;">
    <img src="data/vos_savant.jpg" width=75%>
</div>
</div>


# Simulation Summary

1. Make a function that runs the experiment once.
2. Run that function a bunch of times with a `for`-loop, save results in an array with `np.append`.
3. Count how many times an outcome occurs with `np.count_nonzero`.