## Introduction

Peter Norvig again created [a wonderful notebook](http://nbviewer.ipython.org/url/norvig.com/ipython/Probability.ipynb). This time he explains in a precise and clear manner about probability, as well as demonstrate persuasively how to use Python to calculate probabily. He also introduces paradoxes around probability and talks about "the reasonable people principle".

I realize that I could learn a lot by following Norvig's notebooks, learning the ways he performs reaonsing, and executing his codes.

This notebook does 2 things:
* Learn to use python to calculate probability.
* Documentize famous probability problems and their calculations.

## Probability

We could get a fresh review of probability from "[Introduction to Probability](http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/pdf.html)". 

Over 200 years ago, Laplace wrote:

>The probability of an event is the ratio of the number of cases favorable to it, to the number of all cases possible, when [the cases are] equally possible. ... Probability is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible.

If you want to untangle a probability problem, all you have to do is be methodical about defining exactly what the cases are, and then careful in counting the number of favorable and total cases. We'll start being methodical by defining terms:

* [Experiment](https://en.wikipedia.org/wiki/Experiment_(probability_theory%29)
* [Outcome]()
* [Sample Space]()
* [Event]()
* [Probability]()


## Definition of $P$ for probability in Python

In [2]:
from fractions import Fraction
from __future__ import division

def P(event, space):
    """The probability of an event given a sample space of proable outcomes."""
    return Fraction(len(event & space), len(space))

Probability is a fraction whose numerator is the number of favorable cases (outcomes in the intersection of the sample spaces and the event) and whose denomerator is the number of all possible cases (the sample space).

## Die Roll

The event of rolling an even number can be calculated as followings

In [4]:
D = {1, 2, 3, 4, 5, 6}
even = {2, 4, 6}
P(even, D)

Fraction(1, 2)

## Revised version of $P$ accepting a predicate

In calculating probability of an event, we need to manually enumerate all favorable cases as examples above. We could modify the function P to accept a predicate function (**filter** event according to some criteria).

In [3]:
from fractions import Fraction
from __future__ import division

def P(event, space):
    """The probability of an event given a sample space of proable outcomes."""
    if callable(event):
        event = such_that(event, space)
    return Fraction(len(event & space), len(space))


def such_that(predicate, collection):
    """Filter events in sample space according to callable predication function: event"""
    return {e for e in collection if predicate(e)}

In [7]:
def even(n): return n % 2 == 0

such_that(even, D)

{2, 4, 6}

In [8]:
P(even, D)

Fraction(1, 2)

In [9]:
D12 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
such_that(even, D12)

{2, 4, 6, 8, 10, 12}

In [10]:
P(even, D12)

Fraction(1, 2)

## The two child paradoxes

[Martin Gardner](https://en.wikipedia.org/wiki/Martin_Gardner) posed these two problems:

* **Problem 1**: Mr.Jones has two children. The older is a boy. What is the probability that both children are boys?
* **Problem 2**: Mr.Smith has two children. At least one of them is a boy. What is the probability that both children are boys?

And in 2010, Gary Foshee added this one:

* **Problem 3**. I have two children. At least one of them is a boy born on Tuesday. What is the probability that both children are boys?

Problem 2 and 3 are considered paradoxes because they have surprising answers that people argue about.

### Problem 1: Older child is a boy. What is the probability both are boys?

We use 'BG' to denote the outcome in which the older child is a boy and the younger a girl. The sample space, S, is:

In [11]:
S = {'BG', 'BB', 'GB', 'GG'}

Let's define predicates for the conditions of having two boys, and of the older child being a boy

In [12]:
def two_boys(outcome): return outcome.count('B') == 2

def older_is_a_boy(outcome): return outcome.startswith('B')

Now we can answer Problem 1

In [13]:
P(two_boys, such_that(older_is_a_boy, S))

Fraction(1, 2)

### Problem 2: At least one is a boy. What is the probability both are boys?

Implementing the problem and finding the answer is easy:

In [14]:
def at_least_one_boy(outcome): return 'B' in outcome

P(two_boys, such_that(at_least_one_boy, S))

Fraction(1, 3)

Understanding the problem is tougher. Some people think the answer should be 1/2. Can we justify the answer 1/3? We can see there are three equiable outcomes in which there is at least one boy:

In [15]:
such_that(at_least_one_boy, S)

{'BB', 'BG', 'GB'}

Of those three outcomes, only one has two boys, so the answer of 1/3 is indeed justified. 

But some people still think the answer should be 1/3. Their reasoning is _"If one child is a boy, then there are two equiproable outcomes for the other child, so the probability that the other child is a boy, and thust that there are two boys, is 1/2"_. 

When two methods of reasoning give two different answers, we have a [paradox](https://en.wikipedia.org/wiki/Paradox).

### Problem 3: One is a boy born on Tuesday. What's the probability both are boys?

When Gary Foshee posed this problem, most people could not imagine how the boy's birthday-of-week could be relevant, and felt the answer should be the same as Problem 2. But in order to tell for sure, we should clearly state what the experiment is, define the sample space, and calculate. 
First:

* **Experiment 3a**: A parent is chosen at random from families with two children. She is asked if at least one of here children is a boy born on Tuesday. She replies "Yes".

Next we'll define a sample space. We'll use the notation "G1B3" to mean the older child is a girl born on the first day of the week (Sunday) and the younger a boy born on the third day of the week (Tuesday). We'll call the resulting sample space S3.

In [16]:
sexesdays = {sex + day 
             for sex in 'GB' 
             for day in '1234567'}
S3 = {older + younger
     for older in sexesdays
     for younger in sexesdays}

assert len(S3) == (2*7)**2 == 196
print(sorted(S3))

['B1B1', 'B1B2', 'B1B3', 'B1B4', 'B1B5', 'B1B6', 'B1B7', 'B1G1', 'B1G2', 'B1G3', 'B1G4', 'B1G5', 'B1G6', 'B1G7', 'B2B1', 'B2B2', 'B2B3', 'B2B4', 'B2B5', 'B2B6', 'B2B7', 'B2G1', 'B2G2', 'B2G3', 'B2G4', 'B2G5', 'B2G6', 'B2G7', 'B3B1', 'B3B2', 'B3B3', 'B3B4', 'B3B5', 'B3B6', 'B3B7', 'B3G1', 'B3G2', 'B3G3', 'B3G4', 'B3G5', 'B3G6', 'B3G7', 'B4B1', 'B4B2', 'B4B3', 'B4B4', 'B4B5', 'B4B6', 'B4B7', 'B4G1', 'B4G2', 'B4G3', 'B4G4', 'B4G5', 'B4G6', 'B4G7', 'B5B1', 'B5B2', 'B5B3', 'B5B4', 'B5B5', 'B5B6', 'B5B7', 'B5G1', 'B5G2', 'B5G3', 'B5G4', 'B5G5', 'B5G6', 'B5G7', 'B6B1', 'B6B2', 'B6B3', 'B6B4', 'B6B5', 'B6B6', 'B6B7', 'B6G1', 'B6G2', 'B6G3', 'B6G4', 'B6G5', 'B6G6', 'B6G7', 'B7B1', 'B7B2', 'B7B3', 'B7B4', 'B7B5', 'B7B6', 'B7B7', 'B7G1', 'B7G2', 'B7G3', 'B7G4', 'B7G5', 'B7G6', 'B7G7', 'G1B1', 'G1B2', 'G1B3', 'G1B4', 'G1B5', 'G1B6', 'G1B7', 'G1G1', 'G1G2', 'G1G3', 'G1G4', 'G1G5', 'G1G6', 'G1G7', 'G2B1', 'G2B2', 'G2B3', 'G2B4', 'G2B5', 'G2B6', 'G2B7', 'G2G1', 'G2G2', 'G2G3', 'G2G4', 'G2G5', 'G2G6',

We determine below that the probability of having at least one boy is 3/4, both in S3 and in S:

In [17]:
P(at_least_one_boy, S)

Fraction(3, 4)

In [18]:
P(at_least_one_boy, S3)

Fraction(3, 4)

The probability of two boys is 1/4 in either sample space:

In [19]:
P(two_boys, S)

Fraction(1, 4)

In [21]:
P(two_boys, S3)

Fraction(1, 4)

And the probability of two boys given at least one boy is 1/3 in either sample space:

In [22]:
P(two_boys, such_that(at_least_one_boy, S3))

Fraction(1, 3)

In [23]:
P(two_boys, such_that(at_least_one_boy, S))

Fraction(1, 3)

We will define a predicate for the event of at least one boy born on Tuesday

In [25]:
def at_least_one_boy_tues(outcome): return 'B3' in outcome

We are now ready to answer Problem 3:

In [26]:
P(two_boys, such_that(at_least_one_boy_tues, S3))

Fraction(13, 27)

## The Sleeping Beauty Paradox

The [Sleeping Beauty Paradox](https://en.wikipedia.org/wiki/Sleeping_Beauty_problem) is another tricky one:

> Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: On Sunday she will be put to sleep. Once or twice, during the experiment, Beauty will be awakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes her forget that awakening. A fir coin will be tossed to determine which experimental procedure to undertake: if the coin comes up heads, Beauty will be awakened and interviewed on Monday only. If the coin comes up tails, she will awakened and interviewed on Monday and Tuesday. In either case, shell will be awakened on Wenesday without interview and the experiment ends. Any time Sleeping Beauty is awakened and interviewed, she is asked, "What is your belief now for the proposition that the coin landed heads?"

What should Sleeping Beauty say when she is interviewed? First, she should define the sample space. She could use the notation 'heads/Monday/interviewed' to mean the outcome where the coin flip was heads. It is Monday, and she is interviewed. So it seems there are 4 equiproable outcomes:

In [27]:
B = {'heads/Monday/interviewed', 'heads/Tuesday/sleep',
     'tails/Monday/interviewed', 'tails/Tuesday/interviewed'}

We define a predicate-defining function:

In [4]:
def T(property):
    """Return a predicate that is true of all outcomes 
    that have property as substrings"""
    return lambda outcome: property in outcome

Now we can get the answer:

In [29]:
heads = T("heads")
interviewed = T("interviewed")
P(heads, such_that(interviewed, B))

Fraction(1, 3)

## The Monty Hall Paradox

[This](https://en.wikipedia.org/wiki/Monty_Hall_problem) is one of the most famous probability paradoxes. It can be stated as follows:

> Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

To solve this problem, all we have to do is be careful about how we define our sample spaces. The problem includes 3 actions:
* The car is put behind a door
* You choose a door
* The host open a door with a goat behind it

We have a sample spaces with following outcomes:

In [13]:
M = {'Car1/Open2/W', 'Car1/Open3/W',
     'Car1/Open2/L', 'Car1/Open3/L',
     'Car2/Open3/W', 'Car2/Open3/L',
     'Car3/Open2/W', 'Car3/Open2/L'}

Assume that the contestant picks the door 1, and the host opens the door 3, what is the probability that the car is behind door 1? Or door 2?

In [14]:
such_that(T("Open3"), M)

{'Car1/Open3/L', 'Car1/Open3/W', 'Car2/Open3/L', 'Car2/Open3/W'}

In [7]:
P(T("Car1"), such_that(T("Open3"), M))

Fraction(1, 2)

In [46]:
P(T("Car2"), such_that(T("Open3"), M))

Fraction(1, 2)