In [1]:
import numpy as np

## Recall

Recall that we have Bayes Theorem, written in terms of some hypothesis H and observations (data, experiments, etc) D.

$$ P(H|D) = \frac{ P(H) P(D|H) }{ P(D) } $$

We think of this as updating our assumption, the prior estimate $P(H)$ to account for the results of experiement. 

### Example - Monty Hall

The classic example is the Monty Hall Problem.  On the game show *Let's Make a Deal*, one of the games was that there was a new car hidden behind one of 3 doors. The contestant choose one of the doors. Monty would then open one of the other two doors to show that there was no car behind it, and then ask the contestant:  *would you like to change your guess?* 

The question that appears in almost every Probability and Stats Class is:  What should the contestant do?

### Example - Dice

Suppose I have a box of dice that contains a 3 sided die, a 6 sided die, an 8 sided die, a 12 sided die, and a 20 sided die. I selected a die from the box and roll it and get a 6. Find the probability that the die I selected is each of the dice in the box.

You need to choose a representation for the hypotheses, a representation for the data, and then find the **prior**, **likliehood** and **posterior**.

## Example - The Locomotive Problem

A variation on a problem from Frederick Mosteller's *Fifty Challenging Problems in Probability with Solutions*. A Mathematics Department has N sections of College Algebra labeled 1 through N. You drop by their campus and are taken to a course with the section number 15. What do we expect N to be?

This one is different from the others in that we do not really know what the **prior** should be. Choose something and then proceed. Then we will need to ask what the consequence of our choice was.


## Discussion - The German Tank Problem



# Bayesian Inferences

Recall that we have Bayes Theorem, written in terms of some hypothesis H and observations (data, experiments, etc) D.

$$ P(H|D) = \frac{ P(H) P(D|H) }{ P(D) } $$

We think of this as updating our assumption, the prior estimate $P(H)$ to account for the results of experiement. 

This gives a diferent way for us to think about conficence interval and hypothesis tests we have been doing in this class.

### Consider the following example:

RU-486 is claimed to be an effective “morning after” contraceptive pill, but is it really effective?

Data: A total of 40 women came to a health clinic asking for emergency contraception (usually to prevent pregnancy after unprotected sex). They were randomly assigned to RU-486 (treatment) or standard therapy (control), 20 in each group. In the treatment group, 4 out of 20 became pregnant. In the control group, the pregnancy rate is 16 out of 20.

Do we have enough evidence to conclude that the RU-486 is effective?


We consider just the 20 preganancies in the trial. How likely is it that of those 20 pregnancies, only 4 of them appeared in the RU-486 group?  If the drug is ineffective the proportion should be 0.5. So form this point of view our hypothesis is $H_0: p = 0.5$. The question is then in a binomial trial with 20 pregnancies and a $p=0.5$ how likely is it that only 4 or fewer of them are successes?

In [4]:
def fact(n):
    
    if n > 1:
        return n*fact(n-1)
    else:
        return 1
    
def binom(n, m):
    
    return fact(n)/( fact(m) * fact(n-m) )



p = 0.5
sum( [ p**k * (1-p)**(20 - k) * binom(20, k) for k in range(5) ] )

0.005908966064453125

And we see that this is an outcome we would expect less than 1% of the time.

The precise phisophical statement we are making is that you could repeate this experiment 100 times you would expect to get this outcome only once if the true proportion were 0.5.

## Example Bayes Approach

Okay so now let's do the same problem but from the point of view of Bayes's theorem. What we will do is try and compute the probability p, that a given pregnancy comes from the control group. If there is no difference between test and control we should be concluding that p = 0.5.

We will compute the Bayesian results for a range of p values (I'll explain why when we get to it). To make our lives easy we can just work with a set of discrete values, but you need to use enough of them to discern a difference (think about the significance level we were discussing before):

Let's take the hypothesis:  p = 0.1, 0.2, ..., 0.8, 0.9.

We need to make prior estimates for how likely these propotions are (without reference to the observation of k=4). Our belief prior to the observations, reflecting the null hypothesis is that p=0.5 is most likely and that the other probabilities are equally (and non-zero) likely. We could use 0.52 for p=0.5; and 0.06 for the other 8. But note this is a choice, there are some rules about how to proceed in general.


In [6]:
P_model = { 0.1:0.06, 0.2:0.06, 0.3:0.06, 0.4:0.06, 0.5:0.52, 0.6:0.06, 0.7:0.06, 0.8:0.06, 0.9:0.06}
P_model

{0.1: 0.06,
 0.2: 0.06,
 0.3: 0.06,
 0.4: 0.06,
 0.5: 0.52,
 0.6: 0.06,
 0.7: 0.06,
 0.8: 0.06,
 0.9: 0.06}

In [13]:
# The total probability for our models needs to be 1

sum( P_model.values() )

1.0000000000000002

We then compute the likliehoods:  $$ P(k=4 | n=20, p)$$

Note this is just a binomial probability:  $p^4 (1-p)^{16} \binom{20}{4} $

In [23]:
P_data_model = {}

for p in P_model:
    
    P_data_model[p] = p**4 * (1-p)**(20-4) * binom(20, 4)
    
P_data_model
    

{0.1: 0.08977882814987174,
 0.2: 0.21819940194610074,
 0.3: 0.1304209743738705,
 0.4: 0.03499079040415824,
 0.5: 0.004620552062988281,
 0.6: 0.0002696861504765954,
 0.7: 5.00755833151246e-06,
 0.8: 1.3005697843199956e-08,
 0.9: 3.1788044999999885e-13}

In [24]:
# Then we find the numerators of our Bayes probabilities

P_model_data_model = {}
for p in P_model: 
    
    P_model_data_model[p] = P_model[p] * P_data_model[p] 
    
P_model_data_model

{0.1: 0.005386729688992305,
 0.2: 0.013091964116766044,
 0.3: 0.007825258462432231,
 0.4: 0.0020994474242494944,
 0.5: 0.0024026870727539063,
 0.6: 1.6181169028595726e-05,
 0.7: 3.004534998907476e-07,
 0.8: 7.803418705919973e-10,
 0.9: 1.907282699999993e-14}

In [25]:
# P(data) is then found by summing these results:

P_data = sum( P_model_data_model.values() )
P_data

0.030822569168083413

In [27]:
# and that brings us to our final result the posterior estimates of P_model_data

P_model_data = {}
for p in P_model:
    
    P_model_data[p] = P_model_data_model[p] / P_data
    
P_model_data

{0.1: 0.1747657588054091,
 0.2: 0.42475252615614845,
 0.3: 0.2538807981826265,
 0.4: 0.06811396586704588,
 0.5: 0.07795219988481279,
 0.6: 0.0005249779452308353,
 0.7: 9.747840884135817e-06,
 0.8: 2.531722343898693e-08,
 0.9: 6.18794199016665e-13}

A couple of things:  While we started with a prior estimate of the distribution that favored the null hypothesis, the experiment gave us an updated likliehood with p = 0.2 now showing up as the most likely probability, and p=0.5 now rated at just 8%. If we combine the probabilities for p< 0.5 we find:

In [28]:
P_model_data[0.1] + P_model_data[0.2] + P_model_data[0.3] + P_model_data[0.4]

0.9215130490112299

Meaning our posterior probabilities estimate a 92% chance that $p< 0.5$ and so we are concluding that the drug is effective. Note in this case it is explicitly a statement about the likliehood that the value of p has a particular value based on the educated estimates prior to the experiment and the update following the experiment.

Note also that it is not quite as strong a conclusion as we had in the experimentalist approach. This is due, computationally to our prior estimate (see below) but also reflects our thinking about the problem.

### Effect of the Prior

A big concern about Bayesian methods is the effect of the prior estimate. Consider the previous computation, but suppose we double the sample size and the result successes from k=4 to k=8. 


What you see happening is that any prior estimate we make will be dominated by the likliehoods, provided the sample sizes are large enough. The only caveat is that a prior estimate of 0 for one of our probabilities cannot be adjusted. 