# Bayes Theorem: Applying Conditional Probabilities

## Objectives

- Contrast Bayesian and Frequentist philosophies of probability
- Use Bayes' Theorem to update a prior belief

## So What Do These Bayesians Say?

**Bayesian inference** calculates the probability of an outcome using _prior_ knowledge or belief.

## Bayes' Theorem

## $$P(A\mid B)=\frac {P(B\mid A) \cdot P(A)}{P(B)}$$

## Terminology

- $P(A)$ : 
    - The probability of an event irrespective of the outcomes of other random variables is called the ***marginal probability***.
    - In reference to Bayes' Theorem, this is known as the ***prior probability***.

- $P(A|B)$ :
    - The probability of one (or more) event(s) given the occurence of another event is called the ***conditional probability***.
    - In reference to Bayes' Theorem, this is known as the ***posterior probability***.

- $P(B|A)$ : ***Likelihood***.

- $P(B)$ : ***Evidence***.

This allows us to restate the theorem as

$$
\textrm{Posterior} = \frac{\textrm{Likelihood}\cdot\textrm{Prior}}{\textrm{Evidence}}
$$

The numerator, $P(B\mid A) \cdot P(A)$, is a **joint probability**.


- A joint probability is the probability of two (or more) simultaneous events
    - $P(A,B)$ or $P(A \cap B) = P(A|B)\cdot P(B)$
    - So, in the theorem: $P(B,A)$ or $P(B \cap A) = P(B|A)\cdot P(A)$

**Fun fact:** Bayes never published or even wrote his theorem down. He felt it was so obvious it wasn't worth writing down or talking about!

Richard Price (friend of Bayes) went through papers after Bayes' death to see if anything was worth publishing - that's when Bayes' Theorem was discovered and then published.

## The Example: Fan of a Series 

<img src="https://prod-ripcut-delivery.disney-plus.net/v1/variant/disney/F3020C5A6702EF51552E388CEF7EDFE46CC96A370C2E1C57F9533573DCEFA766/scale?width=1200&aspectRatio=1.78&format=jpeg" alt="obi-wan kenobi promotional image" width=600>

I'm a big fan of _Star Wars_ and I want to be able to share my love for the series! Based on my experience, I found about 1 in 5 people (or 20%) are big fans of the series, enough to watch all the side movies and spinoff tv shows.

When I met Rory, they told me they just watched the newest episode of _Obi-Wan Kenobi_ at a friend's house!

There are two possible scenarios here:

1. Rory is a big fan and watched the newest episode of _Kenobi_ on purpose. I estimate that if you're a big fan, the chances of you actively following the TV Show is about 60% (it would be more, but we fans get busy)

2. Rory isn't a fan and just watched the newest episode because it was on. I estimate that if you're not a big fan, there's only a 5% chance of you watching an episode the week it comes out (about 10x less likely)

Let's define two events:

* $A$ = star wars fan (⭐ fan)
* $B$ = watched latest episode of kenobi (watched ⭐)

Reviewing Bayes' Theorem, we get:

$P(A) = P(\text{⭐fan}) = 0.20$

$P(B|A) = P(\text{watched ⭐, given you are a ⭐fan}) = 0.60$

$P(B|\neg A) = P(\text{watched ⭐, given you are NOT a ⭐ fan}) = 0.05$

$\begin{aligned}
P(B) &= P(\text{watched ⭐})  \\
     &= P(B|A)\cdot P(A) + P(B|\neg A)\cdot P(\neg A) \\
     &= 0.60\cdot 0.20 + 0.05\cdot 0.80 \\
     &= 0.16 
\end{aligned}$

So we get that the chances of someone like Rory is a true _Star Wars_ fan is:

$$\begin{aligned}
P(A|B) &= P(\text{⭐ fan, given you watched ⭐}) \\ 
       &= \frac{P(B|A)P(A)}{P(B)} \\
       &= \frac{0.60 \cdot 0.20}{0.16} \\
       &= 0.75
\end{aligned}$$

So there's a $75\%$ chance Rory is a fan too!

## Visualizing Probabilities & Bayes

We are very visual creatures, so it can help if we try to visualize this same scenario.

Imagine we have a population of $50$ people. We'd get:

-  $0.2\cdot 50 = 10$ are _Star Wars_ fans
    - 4 haven't watched: 😔
    - 6 have watched: ⭐
-  $0.8\cdot 50 = 40$ are not _Star Wars_ fans: 
    - 38 haven't watched: 🚫
    - 2 have watched: 👁

<pre>

⭐⭐👁👁🚫🚫🚫🚫🚫🚫
⭐⭐🚫🚫🚫🚫🚫🚫🚫🚫
⭐⭐🚫🚫🚫🚫🚫🚫🚫🚫
😔😔🚫🚫🚫🚫🚫🚫🚫🚫
😔😔🚫🚫🚫🚫🚫🚫🚫🚫

</pre>

But we only care about those who watched (⭐ & 👁). So given that Rory watched one of the Kenobi episodes, we only care about this subset:

<pre>

⭐⭐👁👁
⭐⭐
⭐⭐

</pre>

Thus there is a $6$ out of $8$ chance Rory also is a _Star Wars_ fan or simply $75\%$

!["Never tell me the odds without first establishing a Bayesian prior!" Han Solo image from the Count Bayesie blog](images/bayesianpriors.jpeg)

[Image Source: 
'Han Solo and Bayesian Priors'](https://www.countbayesie.com/blog/2015/2/18/hans-solo-and-bayesian-priors)

## The Other Faction - Frequentists

![Star Wars Empire Symbol](images/star_wars_empire_symbol.jpg)


**Frequentists** interpretation of probability: the limit of frequency after many, many trials.

Basically, the Bayesians assign a probability to a hypothesis but Frequentists test a hypothesis and determine probability with repeated trials.

Some days you're a Frequentist, other days you're a Bayesian.

## Example: 1984 Congressional Voting Data

Let's do an example. Here's the real theorem again for reference:

## $$P(A\mid B)=\frac {P(B\mid A) \cdot P(A)}{P(B)}$$

Data source: [Congressional Quarterly Almanac, 98th Congress, 2nd session 1984](https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records)

A congressman voted no on providing aid to El Salvador in 1984. Given that 61% of the congress were Democrats, 74.9% of whom voted 'No' for providing aid to El Salvador, and only 4.8% of Republicans voted 'No' to the proposal, what is the conditional probability that this individual is a Democrat?

1. Which probability are we trying to find?

    - P(Democrat | 'no') --> P(A | B)
    
2. Based on that, what other pieces do we need?

    - P('no') --> P(B)
    - P(Democrat) --> P(A)
    - P('no' | Democrat) --> P(B|A)
    
3. Result?

    - P(Democrat) = .61
    - P('no' | Democrat) = .749
    - P('no') = P('no' | Dem) * P(Dem) + P('no' | Rep) * P(Rep) = (.749 * .61) + (.048 * .39) = .4761
 

In [4]:
p_no = (.749 * .61) + (.048 * .39)
p_no

0.47561

In [5]:
p_dem = .61

p_no_given_dem = .749

In [8]:
p_dem_given_no = (p_no_given_dem * p_dem)/p_no
p_dem_given_no

0.960640020184605

We have this data, we can do this even more exactly:

In [9]:
# Imports, then grab and explore the data
import pandas as pd

df = pd.read_csv("data/clean_house-votes-84.csv")

In [10]:
df.head()

Unnamed: 0,Class Name,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missile,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
0,republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
2,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 435 entries, 0 to 434
Data columns (total 17 columns):
 #   Column                                  Non-Null Count  Dtype 
---  ------                                  --------------  ----- 
 0   Class Name                              435 non-null    object
 1   handicapped-infants                     435 non-null    object
 2   water-project-cost-sharing              435 non-null    object
 3   adoption-of-the-budget-resolution       435 non-null    object
 4   physician-fee-freeze                    435 non-null    object
 5   el-salvador-aid                         435 non-null    object
 6   religious-groups-in-schools             435 non-null    object
 7   anti-satellite-test-ban                 435 non-null    object
 8   aid-to-nicaraguan-contras               435 non-null    object
 9   mx-missile                              435 non-null    object
 10  immigration                             435 non-null    object
 11  synfue

Now: break it down so we can calculate $P(\text{Democrat } | \text{ voted 'no' on El Salvador aid})$

In [15]:
df['el-salvador-aid'].value_counts()['n']

208

In [16]:
p_no = df['el-salvador-aid'].value_counts()['n']/len(df)
p_no

0.4781609195402299

In [20]:
df['Class Name'].value_counts()['democrat']

267

In [21]:
p_dem = df['Class Name'].value_counts()['democrat'] / len(df)
p_dem

0.6137931034482759

In [22]:
dems = df.loc[df['Class Name'] == 'democrat']
dems

Unnamed: 0,Class Name,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missile,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
2,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y
5,democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y
6,democrat,n,y,n,y,y,y,n,n,n,n,n,n,?,y,y,y
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
425,democrat,n,n,y,n,n,n,y,y,n,y,y,n,n,n,y,?
426,democrat,y,n,y,n,n,n,y,y,y,y,n,n,n,n,y,y
428,democrat,?,?,?,n,n,n,y,y,y,y,n,n,y,n,y,y
429,democrat,y,n,y,n,?,n,y,y,y,y,n,y,n,?,y,y


In [26]:
dems['el-salvador-aid'].value_counts()['n']

200

In [28]:
p_no_given_dem = dems['el-salvador-aid'].value_counts()['n'] / len(dems)
p_no_given_dem

0.7490636704119851

In [29]:
# GOAL: find P(Dem | 'no')
p_dem_given_no = (p_no_given_dem * p_dem) / p_no
p_dem_given_no

0.9615384615384617

In [30]:
# Checking without Bayes:
es_nos = df.loc[df['el-salvador-aid'] == 'n']

In [32]:
es_nos['Class Name'].value_counts(normalize=True)

democrat      0.961538
republican    0.038462
Name: Class Name, dtype: float64

_____

Yes, you do just need to remember which piece is which...

<center><img src='https://imgs.xkcd.com/comics/modified_bayes_theorem_2x.png' width=500></center>

[Image Source: XKCD](https://xkcd.com/2059/)

(for the record, $P(C)$ in this example is always very low)


## Sidebar: Bayes' Theorem with...  Legos?

Will Kurt, who writes the [Count Bayesie blog](https://www.countbayesie.com/) and is the author of [_Bayesian Statistics the Fun Way_](https://nostarch.com/learnbayes), uses legos to derive Bayes' Theorem. Take a look: https://www.countbayesie.com/blog/2015/2/18/bayes-theorem-with-lego

## Sidebar: Bayes' Original Thought Experiment

Good explanation & demo of Bayes' first thought experiment: https://www.youtube.com/watch?v=7GgLSnQ48os

- Sit facing away a table
- Place an initial ball on table
- Toss another ball & assistant says where in relation the 1st ball lies
- Repeat & continuously update the belief where the ball is

## Sidebar: Why C3P0 is Wrong & Han Solo is a Badass

"Never tell me the odds!" https://www.countbayesie.com/blog/2015/2/18/hans-solo-and-bayesian-priors

Han is a one great pilot, so he likely will get through the asteroid field even if others will fail (crash and blow-up)

(Note: this is the image source for the above Han Solo image!)

# Level Up Exercises

## Diagnostic Testing 🤢

Pretend we test positive for a rare disease. What are the chances that we actually have the disease?

- The disease is fairly rare: only 2% of the population has it
- The test will be correct 99% of the time, whether or not you have the disease 

<details>
    <summary><b><u>Click Here for Answer Code</u></b></summary>

    # probability of sick & healthy
    p_sick = 0.02
    p_not_sick = 1 - p_sick

    # probability of test being correct (accuracy)
    p_positive_sick = 0.99
    p_positive_not_sick = 1 - p_positive_sick

    # probability of positive test (whether or not you are sick)
    p_positive = p_sick*p_positive_sick + p_not_sick*p_positive_not_sick

    # belief before hand
    sick_prior = p_sick

    # how likely are we to test positive and be sick
    pos_sick_likelihood = p_positive_sick 

    prob_youre_sick = sick_prior * pos_sick_likelihood / p_positive

    print(f'You have a {prob_youre_sick*100:.2f}% chance of actually being sick')
    
</details>

In [None]:
# Your work here


## Better Tests

How would the calculation in the Diagnostic Testing scenario change if the test were 99.9% correct?


<details>
    <summary><b><u>Click Here for Answer Code</u></b></summary>

    # probability of sick & healthy
    p_sick = 0.02
    p_not_sick = 1 - p_sick

    # probability of test being correct (accuracy)
    p_positive_sick = 0.999
    p_positive_not_sick = 1 - p_positive_sick

    # probability of positive test (whether or not you are sick)
    p_positive = p_sick*p_positive_sick + p_not_sick*p_positive_not_sick

    # belief beforehand
    sick_prior = p_sick

    # how likely are we to test positive and be sick
    pos_sick_likelihood = p_positive_sick 

    prob_youre_sick = sick_prior * pos_sick_likelihood / p_positive

    print(f'You have a {prob_youre_sick*100:.2f}% chance of actually being sick')
    
</details>

In [None]:
# Your work here


## Harder Exercise: Two Tests

In the original Diagnostic Testing scenario above, what are the chances that we have the disease if we take the test twice and it is positive both times? (Assume the tests are independent.)


<details>
    <summary><b><u>Click Here for Answer Code</u></b></summary>

    # probability of sick & healthy
    p_sick = 0.02
    p_not_sick = 1 - p_sick

    # probability of 2 tests being correct
    p_positive_sick = 0.99
    p_2positive_sick = p_positive_sick**2
    
    # probability of 2 tests being incorrect
    p_positive_not_sick = 1 - p_positive_sick
    p_2positive_not_sick = p_positive_not_sick**2

    # probability of positive test (whether or not you are sick)
    p_2positive = p_sick*p_2positive_sick + p_not_sick*p_2positive_not_sick

    # belief before hand
    sick_prior = p_sick

    # how likely are we to test positive and be sick
    pos_sick_likelihood = p_2positive_sick 


    prob_youre_sick = sick_prior * pos_sick_likelihood / p_2positive

    print(f'You have a {prob_youre_sick*100:.2f}% chance of actually being sick')
    
</details>

In [None]:
# Your work here

### Extra Resources:

If these diagnosis-based examples were compelling, or if you want a better visual exploration of a diagnosis-based example, check out this page: https://arbital.com/p/bayes_rule/?l=1zq

(I recommend clicking "I'd like to read everything!" and then skimming the pages that come up!)

# Level Up: The Bayes Factor

Another method for using Bayesian reasoning relies on _odds_ over _probability_. This can feel more intuitive.

If we rearrange Bayes' Theorem, we can use the odds:

$$P(A|B) =  \frac{P(B|A)P(A)}{P(B)}$$

$$O(A|B) =  O(A)\frac{P(B|A)}{P(B|\neg A)}$$


We call this $\frac{P(B|A)}{P(B|\neg A)}$ the **Bayes factor** and can be seen how we update the prior odds.

To use this we, say what we want is the _odds_ that $A$ is true given $B$ is true (observed)

## Using the Previous Example

Going back to our _Star Wars_ example:

* $P(A) = 20\%$ (or $O(A)= \frac{1}{4}$ or {$1:4$}) of all people are _Star Wars_ fans 
* $P(B|A) = 60\%$ of all _Star Wars_ fans have seen one of the Kenobi episodes in the past week
* $P(B|\neg A) = 5\%$ of all non-_Star Wars_ fans have seen one of the Kenobi episodes in the past week

Well, then our odds form turns to:

$$\begin{aligned}
O(A|B) &= O(A)\frac{P(B|A)}{P(B|\neg A)} \\
\\
       &= \frac{1}{4} \cdot \frac{0.60}{0.05} \\
       &= \frac{1}{4} \cdot 12 \\
       &= \frac{12}{4} \\
       &= \frac{3}{1} \\
       & = 3:1 \text{ odds}
\end{aligned}$$

Or simply $3$ out of $4$ people who watched a Kenobi episode is a fan. That's the same $75\%$ we found from before!

## Going back to the visual example

Using a pretend population of $50$ random people we still get:

-  $0.2\cdot 50 = 10$ are _Star Wars_ fans
    - 4 haven't watched Kenobi in the past week: 😔
    - 6 have watched in the past week: ⭐
-  $0.8\cdot 50 = 40$ are not _Star Wars_ fans: 
    - 38 haven't watched in the past week: 🚫
    - 2 have watched in the past week: 👁

<pre>

⭐⭐👁👁🚫🚫🚫🚫🚫🚫
⭐⭐🚫🚫🚫🚫🚫🚫🚫🚫
⭐⭐🚫🚫🚫🚫🚫🚫🚫🚫
😔😔🚫🚫🚫🚫🚫🚫🚫🚫
😔😔🚫🚫🚫🚫🚫🚫🚫🚫

</pre>

## Interpreting further

- There's $1$:$4$ odds you are a fan given no information
- But if you watched a Kenobi episode in the past week, we know you're more likely to be a fan by the **Bayes factor** of 12 
- We update our _prior_ belief with that knowledge to find we expect that there are 3 times **more** fans who watched in the past month than non-fans who watched in the past week