In [1]:
import thinkplot
import thinkstats2
import pandas as pd
import numpy as np
import scipy.stats as ss
from fractions import Fraction

##Seaborn for fancy plots. 
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams["figure.figsize"] = (15,5)

# Bayes and Updates

We will solve the Monty Hall problem, but first we can work on some more simple examples and build out a method for using Bayes probability calculations. Remember the Bayes theorem from before:

$ P(A|B) = \frac{P(A) P(B|A)}{P(B)} $

Suppose you have two high school classes of 40 students - class A and class B. Each class has some failing students:
<ul>
<li> Class A has 10 failing students, 30 passing ones. 
<li> Class B has 20 failing students, 20 passing ones.
</ul>

<b>If we randomly select one failing student, what is the probability they are from Class A?</b>

We can calculate this out as a table. First, another way to think of Bayes....

## Diachronic Bayes

There is another way to think of Bayes's theorem: it gives us a way to
update the probability of a hypothesis, $H$, given some body of data, $D$.

This interpretation is "diachronic", which means "related to change over time"; in this case, the probability of the hypotheses changes as we see new data.

Rewriting Bayes's theorem with $H$ and $D$ yields:

$ P(H|D) = \frac{P(H)~P(D|H)}{P(D)} $

In this interpretation, each term has a name:

-  $P(H)$ is the probability of the hypothesis before we see the data, called the prior probability, or just **prior**.

-  $P(H|D)$ is the probability of the hypothesis after we see the data, called the **posterior**.

-  $P(D|H)$ is the probability of the data under the hypothesis, called the **likelihood**.

-  $P(D)$ is the **total probability of the data**, under any hypothesis.

Sometimes we can compute the prior based on background information. For example, the classroom problem specifies that we choose a student at random with equal probability.

In other cases the prior is subjective; that is, reasonable people might disagree, either because they use different background information or because they interpret the same information differently.

### Example - Passing and Failing

From above:
<ul>
<li> Class A has 10 failing students, 30 passing ones. 
<li> Class B has 20 failing students, 20 passing ones.
</ul>

First, we can build a table to hose our work. 

In [2]:
table = pd.DataFrame(index=['Class A', 'Class B'])
table

Class A
Class B


#### Calculate Prior Probability

We can then add in the first part - the prior probability. 

We can also think of this as "the probability before we know anything else" - here we are just finding the probability of them being in Class A, without adding any other information. 

In [3]:
table['prior'] = 1/2, 1/2
table

Unnamed: 0,prior
Class A,0.5
Class B,0.5


#### Calculate Likelihoods

Add the likelihoods in...
For class A there is a 25% chance of a student failing.
For class B, it is 50%

We can also think of the likelihoods explicitly as conditional statements:
<ul>
<li> E.g. "If I choose from class B, what is the likelihood of getting a failiure?"
<li> Or, give the prior is true, now what is the probability?
</ul>

You are assuming the "Question part" of the original goal - what are the chances this class provides a failiure, given the stipulation of the prior probability. 

<ul>
<li> If I choose A, 10 out of 40 are failing, so the chances are 1/4
<li> If I choose B, 20 out of 40 are failing, so the chances are 1/2
</ul>

In [4]:
table['likelihood'] = 1/4, 1/2
table

Unnamed: 0,prior,likelihood
Class A,0.5,0.25
Class B,0.5,0.5


#### Calculate Interim Probabilities

Next, multiply the two probabilities together:
<ul>
<li>E.g. There's a 50% chance of choosing class B, and if I do, there's a 50% chance of getting a fail. 
</ul>
We label this column the unnorm - or unnormalized probabilities. This is because they are both accurate probabilities, but they are not normalized - they do not sum to 1. If we look a little closer, they are also part our boy Bayes' Theorem:
<ul>
<li>The numerator of Bayes is a probability multiplied by a conditional, which is the unnorm value.
<li>The denomenator is the total probability - there are only 2 cases here, one must be true, so it is the sum of the unnorms.
</ul>

We can also think of these probabilities as being "in terms of" the total possible outcomes - i.e. 12.5 of <i>all</i> students are failing in Class A, while 25% of students is failing in B. The other 62.5% are passing, in either class, so they aren't really a consideration in the calculation we are doing. 

In [5]:
#Calculate unnormalized probabilities
table['unnorm'] = table['prior'] * table['likelihood']
table

Unnamed: 0,prior,likelihood,unnorm
Class A,0.5,0.25,0.125
Class B,0.5,0.5,0.25


As a check, we can demonstrate that last point is true:

- Calculate the total probability of getting a fail by summing the unnorms.

- Calculate the total probability of getting a fail by direct calculation. 

In [6]:
prob_data = table['unnorm'].sum()
print("Unnorms:", prob_data)
probDirect = (10+20)/(40+40) #The overall fail chances - 30 total failiures, 80 total students. 
print("Direct:", probDirect)

Unnorms: 0.375
Direct: 0.375


#### Calculate Posterior Probabilities

Now we can normalize - or make the probs total to 1. We just divide by that total probability. This gives us the posterior probabilities, answering our original question:

$ P(Class A | Failing) $

As well as giving us the other probabilites, for free. 

This step just shifts that probability that we calculated above to be "out of" the total we care about (failing people), rather than the entire total of all students. 

In [7]:
table['posterior'] = table['unnorm'] / prob_data
table

Unnamed: 0,prior,likelihood,unnorm,posterior
Class A,0.5,0.25,0.125,0.333333
Class B,0.5,0.5,0.25,0.666667


### Table Update Formula

We can wrap those last steps un into a formula, since they are just calculating from the probabilites that we've provided.

In [8]:
def update(table):
    """Compute the posterior probabilities."""
    table['unnorm'] = table['prior'] * table['likelihood']
    prob_data = table['unnorm'].sum()
    table['posterior'] = table['unnorm'] / prob_data
    return prob_data

## Dice problem

Suppose I have a box with a 6-sided die, an 8-sided die, and a 12-sided die. I choose one of the dice at random, roll it, and report that the outcome is a 1. What is the probability that I chose the 6-sided die?

<b>Note:</b> The fractions function will give us fractions rather than decimals. It is helpful here. 

In [9]:
#Create the table
dice = pd.DataFrame(index=["Six Side", "Eight Side", "12 Side"])
dice["prior"] = Fraction(1,3)
dice

Unnamed: 0,prior
Six Side,1/3
Eight Side,1/3
12 Side,1/3


In [10]:
#The probability for each die being a 1, given I pick it.
dice["likelihood"] = Fraction(1, 6), Fraction(1, 8), Fraction(1, 12)
dice

Unnamed: 0,prior,likelihood
Six Side,1/3,1/6
Eight Side,1/3,1/8
12 Side,1/3,1/12


In [11]:
#Update to finish
update(dice)
dice

Unnamed: 0,prior,likelihood,unnorm,posterior
Six Side,1/3,1/6,1/18,4/9
Eight Side,1/3,1/8,1/24,1/3
12 Side,1/3,1/12,1/36,2/9


### Exercise Scenario

Suppose you are placing a sports bet on your favorite team - The Bayes. You know a few things:
<ul>
<li>The Bayes have a 50% chance of winning a game. (Based on past performance)
<li>The Bayes have had a 10% chance of having rain in their games in Bayes Stadium.
<li>However, in games that The Bayes have won in Bayes Statium, there's be a 11% chance of rain. 
<li>So...
<li>P(W) = 50%
<li>P(R) = 10%
<li>P(R|W) = 11%
</ul>

<b>What is the probability that The Bayes win if it rains? </b>

In [12]:
#Build Table
bet = pd.DataFrame(index=["Win", "Loss"])
bet["prior"] = Fraction(1,2)
bet 

Unnamed: 0,prior
Win,1/2
Loss,1/2


In [13]:
#Add in the likelihoods.
#Given that we have a win/loss, how likely is rain. 
bet["likelihood"] = .11, .1
bet

Unnamed: 0,prior,likelihood
Win,1/2,0.11
Loss,1/2,0.1


In [14]:
#Update to complete. 
update(bet)
bet

Unnamed: 0,prior,likelihood,unnorm,posterior
Win,1/2,0.11,0.055,0.52381
Loss,1/2,0.1,0.05,0.47619


## Multiple Feature Bayes

What is the probability of playing when the weather is sunny, and the temperature is cool. To do so we'll break it into two steps. Each time we do an update on our table, that's incorporating one new piece of information. Now, we can do it more than once, each one will be basically a "redo" of the process. 

We can think of this as updating our understanding, one varaible at a time. We start with the prior probability, then proceed to "add" knowledge to our understanding, one feature at a time. 

In [15]:
dfw = pd.read_csv("data/weather.txt", sep="\t")
dfw

Unnamed: 0,Outlook,Temp,Humidity,Windy,Play
0,Rainy,Hot,High,f,no
1,Rainy,Hot,High,t,no
2,Overcast,Hot,High,f,yes
3,Sunny,Mild,High,f,yes
4,Sunny,Cool,Normal,f,yes
5,Sunny,Cool,Normal,t,no
6,Overcast,Cool,Normal,t,yes
7,Rainy,Mild,High,f,no
8,Rainy,Cool,Normal,f,yes
9,Sunny,Mild,Normal,f,yes


### Tables

We want to predict if we are going to play, so we will setup our inital table with those two options and their prior probabilities. 

<b>Note:</b> I've also calculated some other totals that we'll need later here. We could also calculate them later, as needed. 

In [16]:
weather = pd.DataFrame(index=["Play", "Not Play"])

# Other calculations for later
# You can mostly ignore this for the moment
total = len(dfw)
dfPlay = dfw[dfw["Play"] == "yes"]
dfNoPlay = dfw[dfw["Play"] == "no"]
playTotal = Fraction(len(dfPlay))
pOutlook = Fraction(len(dfw[dfw["Outlook"] == "Sunny"]), total)
pTemp = Fraction(len(dfw[dfw["Temp"] == "Cool"]), total)
# end other stuff

pPlay = Fraction(len(dfw[dfw["Play"] == "yes"]), total)
weather["prior"] = pPlay, (1 - pPlay)
weather

Unnamed: 0,prior
Play,9/14
Not Play,5/14


#### Add Outlook

Our first update will add the information for the Outlook. This step isn't really any different than before. 

In [17]:
if_playOut = Fraction(len(dfPlay[dfPlay["Outlook"] == "Sunny"]), len(dfPlay))
if_notOut = Fraction(len(dfNoPlay[dfNoPlay["Outlook"] == "Sunny"]), len(dfNoPlay))
weather["likelihood"] = if_playOut, if_notOut
weather

Unnamed: 0,prior,likelihood
Play,9/14,1/3
Not Play,5/14,2/5


In [18]:
update(weather)
weather

Unnamed: 0,prior,likelihood,unnorm,posterior
Play,9/14,1/3,3/14,3/5
Not Play,5/14,2/5,1/7,2/5


### Priors with Multiple Variables

Now we'll do the second round of our updates. Here, we'll need to "grab" the result of the previous update as our starting point has changed. Now we aren't starting with the simple intial probabilities, we are starting having already "learned" of the Outlook information. 

#### Add Temp

We will take the outcome of the previous table as our priors. 

<b>Thought Experiment:</b> in the example below there are both the normalized and unnormalized probabilities. Try using either for the prior probs, what happens? 

In [19]:
w2 = weather[["posterior"]]
w2 = w2.rename(columns={"posterior": "prior"})
#w2 = weather[["unnorm"]]
#w2 = w2.rename(columns={"unnorm": "prior"})
w2

Unnamed: 0,prior
Play,3/5
Not Play,2/5


In [20]:
if_playTemp = Fraction(len(dfPlay[dfPlay["Temp"] == "Cool"]), len(dfPlay))
if_notTemp = Fraction(len(dfNoPlay[dfNoPlay["Temp"] == "Cool"]), len(dfNoPlay))
w2["likelihood"] = if_playTemp, if_notTemp
w2

Unnamed: 0,prior,likelihood
Play,3/5,1/3
Not Play,2/5,1/5


In [21]:
update(w2)
w2

Unnamed: 0,prior,likelihood,unnorm,posterior
Play,3/5,1/3,1/5,5/7
Not Play,2/5,1/5,2/25,2/7


In [22]:
5/7

0.7142857142857143

#### Results

We would predict this as a "play", since we are about 71% likely to think that is a play. 

### Direct Calculation of Multiple Varaible Bayes

We can also calculate this a bit more directly. The Bayes formula can be expanded into dealing with multiple features. 

![Naive Bayes](images/naive_bayes.png "Naive Bayes")

What do we have? The conditional probability, when conditioned on multiple values, is:
<ul>
<li> The product of all the "flipped" individual conditional probabilities, multiplied by the overall probability. Divided by...
<li> The product of the probabilities of the varaibles themselves.
</ul>

So for us it is:

$ P(Golf | Sunny+Cool) = \frac{P(Sunny | Golf) * P(Cool | Golf) * P(Golf)}{P(Sunny) * P(Cool)} $

If we bust out the math...

In [23]:
# Conditionals - Probabilities if we golf
pOutlookPlay = len(dfPlay[dfPlay["Outlook"] == "Sunny"]) / playTotal
pTempPlay = len(dfPlay[dfPlay["Temp"] == "Cool"]) / playTotal

# Negatives - Probabilites if we don't
pOutlookNo = len(dfNoPlay[dfNoPlay["Outlook"] == "Sunny"]) / len(dfNoPlay)
pTempNo = len(dfNoPlay[dfNoPlay["Temp"] == "Cool"])/ len(dfNoPlay)

#Denominator (this doesn't change)
pDen = float(pOutlook * pTemp)

# Raw Likelihoods
like_play = float(pOutlookPlay * pTempPlay * pPlay)
like_noplay = float(pOutlookNo * pTempNo * (1-pPlay))

print(like_play, like_noplay)

0.07142857142857142 0.028571428571428577


Now we can normalize the likelihoods and get reall probabilities. 

In [24]:
# normalize
nPlay = like_play / pDen
nNot = like_noplay / pDen
tot_prob = nPlay + nNot

print((nPlay/tot_prob), (nNot/tot_prob))

0.7142857142857143 0.28571428571428575


#### Results

Looks pretty similar. We are awesome. 

### Likelihoods

One note when using Bayes as a classifier. All we really care about here is the likelihoods, not the final probability. We are going to end up making a prediction for whatever has the higher likelihood, since the denominators are the same for each probability calculation. The final normalization part can kind of be ingored, it makes the result more readable, but doesn't actually impact what we will do. 

## Big Example - Sunny, Hot, Normal, False

We can generate a "full" prediction as well - we'll use all 4 features to make a prediciton. 

We start the same way as always - setup the prior probabilities. 

In [25]:
weatherBig = pd.DataFrame(index=["Play", "Not Play"])
weatherBig["prior"] = pPlay, (1 - pPlay)
weatherBig

Unnamed: 0,prior
Play,9/14
Not Play,5/14


##### Outlook

We will update the probabilities with the Outlook. 

In [26]:
if_playOut2 = Fraction(len(dfPlay[dfPlay["Outlook"] == "Sunny"]), len(dfPlay))
if_notOut2 = Fraction(len(dfNoPlay[dfNoPlay["Outlook"] == "Sunny"]), len(dfNoPlay))
print(Fraction(if_playOut2), Fraction(if_notOut2))

# Ignore - this is to check answers against a solution with slightly different data
#if_playOut2 = Fraction(2,9)
#if_notOut2 = Fraction(3,5)

weatherBig["likelihood"] = if_playOut2, if_notOut2
update(weatherBig)
weatherBig

1/3 2/5


Unnamed: 0,prior,likelihood,unnorm,posterior
Play,9/14,1/3,3/14,3/5
Not Play,5/14,2/5,1/7,2/5


##### Temperature

Take the existing probabilities as the priors, do another update. 

In [27]:
wB1 = weatherBig[["unnorm"]]
wB1 = wB1.rename(columns={"unnorm": "prior"})
wB1

Unnamed: 0,prior
Play,3/14
Not Play,1/7


In [28]:
if_playTemp2 = Fraction(len(dfPlay[dfPlay["Temp"] == "Hot"]), len(dfPlay))
if_notTemp2 = Fraction(len(dfNoPlay[dfNoPlay["Temp"] == "Hot"]), len(dfNoPlay))
print(Fraction(if_playTemp2), Fraction(if_notTemp2))
wB1["likelihood"] = if_playTemp2, if_notTemp2
update(wB1)
wB1

2/9 2/5


Unnamed: 0,prior,likelihood,unnorm,posterior
Play,3/14,2/9,1/21,5/11
Not Play,1/7,2/5,2/35,6/11


##### Humidity

Take the existing probabilities as the priors, do another update. 

In [29]:
wB2 = wB1[["unnorm"]]
wB2 = wB2.rename(columns={"unnorm": "prior"})
wB2

Unnamed: 0,prior
Play,1/21
Not Play,2/35


In [30]:
if_playHum2 = Fraction(len(dfPlay[dfPlay["Humidity"] == "Normal"]) , len(dfPlay))
if_notHum2 = Fraction(len(dfNoPlay[dfNoPlay["Humidity"] == "Normal"]) , len(dfNoPlay))
print(Fraction(if_playHum2), Fraction(if_notHum2))
wB2["likelihood"] = if_playHum2, if_notHum2
update(wB2)
wB2

2/3 1/5


Unnamed: 0,prior,likelihood,unnorm,posterior
Play,1/21,2/3,2/63,25/34
Not Play,2/35,1/5,2/175,9/34


##### Wind

Take the existing probabilities as the priors, do another update. 

In [31]:
wB3 = wB2[["unnorm"]]
wB3 = wB3.rename(columns={"unnorm": "prior"})
wB3

Unnamed: 0,prior
Play,2/63
Not Play,2/175


In [32]:
if_playWind2 = Fraction(len(dfPlay[dfPlay["Windy"] == "f"]) , len(dfPlay))
if_notWind2 = Fraction(len(dfNoPlay[dfNoPlay["Windy"] == "f"]) , len(dfNoPlay))
print(Fraction(if_playWind2), Fraction(if_notWind2))

wB3["likelihood"] = if_playWind2, if_notWind2
update(wB3)
wB3

2/3 2/5


Unnamed: 0,prior,likelihood,unnorm,posterior
Play,2/63,2/3,4/189,125/152
Not Play,2/175,2/5,4/875,27/152


In [33]:
125/152

0.8223684210526315

#### Check Direct Calculation

We can use the formula above - calculate each likelihood then normalize. 

In [34]:
# Conditionals
pOutlook = Fraction(len(dfw[dfw["Outlook"] == "Sunny"]), total)
pTemp = Fraction(len(dfw[dfw["Temp"] == "Hot"]), total)
pHum = Fraction(len(dfw[dfw["Humidity"] == "Normal"]), total)
pWind = Fraction(len(dfw[dfw["Windy"] == "f"]), total)

big_num = pPlay * if_playOut2 * if_playTemp2 * if_playHum2 * if_playWind2
big_not = (1-pPlay) * if_notOut2 * if_notTemp2 * if_notHum2 * if_notWind2
big_den = pOutlook * pTemp * pHum * pWind

# Normalize and show final probs
play_prob = float(big_num/big_den)
not_prob = float(big_not/big_den)
tot_prob = play_prob + not_prob
print("Yes",  big_num/big_den, play_prob, play_prob/tot_prob)
print("No",  big_not/big_den, not_prob, not_prob/tot_prob)

Yes 98/135 0.725925925925926 0.8223684210526316
No 98/625 0.1568 0.17763157894736842


## Results and Looking Forward

Going through this should set off a few bells in your mind - we are using a bunch of features to generate a prediction....

As you may guess, Bayes is the basis of a (set of) predictive model. Right now we are doing a version of Naive Bayes, which is a common simple classification model, often used for things like spam detection, because it is very fast. 

Next time we'll build these concepts up into a full blown predictive model algorithm, from scratch! 

![Bayes](images/bayes.jpeg "Bayes")

## Exercise - Cars

What if we want to know the odds that a car is stolen given it is a BMW, black, and at night? 

In [35]:
df_car = pd.read_csv("data/vehicle_stolen_dataset.csv", names=["ID", "Make", "Color", "Time", "Stolen"])
df_car.head(20)

Unnamed: 0,ID,Make,Color,Time,Stolen
0,N001,BMW,black,night,yes
1,N002,Audi,black,night,no
2,N003,NISSAN,black,night,yes
3,N004,VEGA,red,day,yes
4,N005,BMW,blue,day,no
5,N006,Audi,black,day,yes
6,N007,VEGA,red,night,no
7,N008,Audi,blue,day,yes
8,N009,VEGA,black,day,yes
9,N010,NISSAN,blue,day,no


In [36]:
car = pd.DataFrame(index=["Stolen", "Not Stolen"])
stole = df_car[df_car["Stolen"]=="yes"]
notStole = df_car[df_car["Stolen"] == "no"]

stolen = len(stole) / len(df_car)
car["prior"] = stolen, (1 - stolen)
car

Unnamed: 0,prior
Stolen,0.65
Not Stolen,0.35


In [37]:
if_stolen = len(stole[stole["Make"] == "BMW"]) / len(stole)
if_notS = len(notStole[notStole["Make"] == "BMW"]) / len(notStole)
car["likelihood"] = if_stolen, if_notS
update(car)
car

Unnamed: 0,prior,likelihood,unnorm,posterior
Stolen,0.65,0.307692,0.2,0.666667
Not Stolen,0.35,0.285714,0.1,0.333333


In [38]:
car2 = car[["unnorm"]]
car2 = car2.rename(columns={"unnorm": "prior"})
isblack = len(stole[stole["Color"] == "black"]) / len(stole)
isblack2 = len(notStole[notStole["Color"] == "black"]) / len(notStole)
car2["likelihood"] = isblack, isblack2
update(car2)
car2

Unnamed: 0,prior,likelihood,unnorm,posterior
Stolen,0.2,0.615385,0.123077,0.896
Not Stolen,0.1,0.142857,0.014286,0.104


In [39]:
car3 = car2[["unnorm"]]
car3 = car3.rename(columns={"unnorm": "prior"})
isnight = len(stole[stole["Time"] == "night"]) / len(stole)
isnight2 = len(notStole[notStole["Time"] == "night"]) / len(notStole)
car3["likelihood"] = isnight, isnight2
update(car3)
car3

Unnamed: 0,prior,likelihood,unnorm,posterior
Stolen,0.123077,0.461538,0.056805,0.932963
Not Stolen,0.014286,0.285714,0.004082,0.067037


#### Check

In [40]:
pBMW = len(df_car[df_car["Make"] == "BMW"]) / len(df_car)
pBlack = len(df_car[df_car["Color"] == "black"]) / len(df_car)
pNight = len(df_car[df_car["Time"] == "night"]) / len(df_car)

stolen_num = if_stolen * isblack * isnight * stolen
not_st_num = if_notS * isblack2 * isnight2 * (1-stolen)
denom_car = pBMW * pBlack * pNight

stol_prob = stolen_num / denom_car
notSt_prob = not_st_num / denom_car
tot_car_p = stol_prob + notSt_prob

print("Yes",  stol_prob, stol_prob/tot_car_p)
print("No",  notSt_prob, notSt_prob/tot_car_p)

Yes 1.0519395134779752 0.9329631098770329
No 0.07558578987150413 0.06703689012296704


## Thought Intermission - The Monty Hall Problem

Next we'll use a Bayes table to solve one of the most contentious problems in probability.

The Monty Hall problem is based on a game show called *Let's Make a Deal*. If you are a contestant on the show, here's how the game works:
<ul>
<li> The host, Monty Hall, shows you three closed doors -- numbered 1, 2, and 3 -- and tells you that there is a prize behind each door.
<li> One prize is valuable (traditionally a car), the other two are less valuable (traditionally goats).
<li> The object of the game is to guess which door has the car. If you guess right, you get to keep the car.
</ul>

The key - after you pick a door, Monty will open another, revealing a goat. Then Monty offers you the option to stick with your original choice or switch to the remaining unopened door. To maximize your chance of winning the car, should you stick with Door 1 or switch to Door 2?

To answer this question, we have to make some assumptions about the behavior of the host (these are parts of the general rules of the game):
<ul>
<li> Monty always opens a door and offers you the option to switch.
<li> He never opens the door you picked or the door with the car.
<li> If you choose the door with the car, he chooses one of the other doors at random.
</ul>

In [41]:
#Start off - initially the chances are equal for each door. 
#So the prior probabilities are all 1/3
monty = pd.DataFrame(index=["Door 1", "Door 2", "Door 3"])
monty["prior"] = Fraction(1,3)
monty

Unnamed: 0,prior
Door 1,1/3
Door 2,1/3
Door 3,1/3


Now we need to decide what door we want - let's get that whip. 

<b>We'll assume we pick door 1.</b>

Then Monty opens one of the other doors, to us it is random. When he does so it gives us the likelihoods.

<b>We'll assume he opens door 3 - remember he always opens a goat door, not the car</b>

Now, that we know that it isn't door 3 (that's open, it is a goat). Remember, each one is a hypothetical - if we are in this "class" (door choice), what is the probability of "success" (a car there)?: 


We can think about this by carefully defining the problem - What are the odds that Monty opened Door 3, given that the Car is in Door X:
<ul>
<li>The likelihood he'd open Door 3 if the car is there is 0 - we can see the goat, and that's the rules. 
<li>The likelihood he'd open Door 3 if the car is in Door 2 is 1 - he'd be forced to by the rules of the game, you picked Door 1, Door 2 has the car, so he can only open Door 3.
<li>The linkelihood he'd open Door 3 if the car is in Door 1 is 1/2 - he just picks randomly 
</ul>

key - we don't really know the probability the car is in Door X directly. We can use the probability that the Door is opened, and the rules of the game to calculate it. 

In [42]:
#We can update the table
monty["likelihood"] = Fraction(1,2),1,0
monty

Unnamed: 0,prior,likelihood
Door 1,1/3,1/2
Door 2,1/3,1
Door 3,1/3,0


In [43]:
update(monty)
monty

Unnamed: 0,prior,likelihood,unnorm,posterior
Door 1,1/3,1/2,1/6,1/3
Door 2,1/3,1,1/3,2/3
Door 3,1/3,0,0,0


Showing it mathmatically requires a bunch of derivation: https://en.wikipedia.org/wiki/Monty_Hall_problem

Alternate explaination that I think is the most clear way to imagine it: Initially there is a 1/3 chance of the car being behind each door. However, after you choose those odds change, due to the rules of the game:
<ul>
<li>The chances it is in your door is still 1/3.
<li>The chances it is not in your door is 2/3.
<li>The door opening part sets the odds for one door to 0, so that 2/3 is contained entirely in one door. 
</ul>

The entire point of this problem is to be unituitive, so having it be confusing is normal. 
<hr>