# Sect 11B: Conditional Probability & Law of Total Probability


- date
- cohort

### QUESTIONS?
- 



## Learning Objectives

- We will review Independent, Dependent, vs Disjointed Events before diving deeper into conditional probability.
- We will cover conditional probability and its theorems.
- We will discuss the Law of Total Probability.



- **Activity: We will extend our dinner party activity to a house party to explore conditional probability and the Law of Total Probability**
- **Demo: Teaser for Sect 12 (if theres time)**

# PROBABILITY REVIEW


## What is probability?

> **Probability is the likelihood of a specific outcome/event occuring out of all possible outcomes, expressed as a fraction between 0 and 1.**
<!---
Example Probability Qs:
- How likely is it to end up with heads when flipping a coin once? (the answer here is 50% - not very surprising)

- How likely is it to end up with exactly 2 x heads and 3 x tails when flipping a coin 5 times?

- How likely is it to throw tails first, then heads, then tails, then heads, then tails when flipping a coin 5 times?

- If you throw 5 dice, what is the probability of throwing a ["full house"](http://grail.sourceforge.net/demo/yahtzee/rules.html)?

- What is the probability of drawing 2 consecutive aces from a standard deck of cards?

> But how do we calculate it? ..._to be continued_...

--->

### Sample Space & Event Space

- **Sample space** ($\Omega$) - all possible outcomes
-   The **event space** is a subset of the sample space. 
    - It is the **desired outcome** of the experiment.


### Probability of an Event

$$\large P(E) = \frac{|E|}{|S|} $$


#### Law of relative frequency

- Limit of large infinite outcomes produce fixed numbers .
$$ \large P(E) = \lim_{n\to\infty}\frac{S(n)}{n}$$
    - Probability of Event E having Successful(S) outcomes for $n$ trials
    

### Addition law of probability 

-   Prob of union of A and B is individual P minus intersection

$$ \large P(A\cup B) = P(A) + P(B) - P(A \cap B)$$

# CONDITIONAL PROBABILITIES

## Types of Events

### Indepdent Events

**Events $A$ and $B$ are independent when the occurrence of $A$ has no effect on whether $B$ will occur (or not).**

- A and B are independent if:
    - $P(A \cap B) = P(A)\cdot P(B)$

 <img src="https://raw.githubusercontent.com/jirvingphd/dsc-conditional-probability-online-ds-ft-100719/master/images/Image_67_independent.png" width=30%>

- Probability of A or B occurring:
    - $P (A \cup B) = P(A) + P(B) - P(A \cap B)$




### Disjoint Events




**Events $A$ and $B$ are disjoint if $A$ occurring means that $B$ cannot occur.**

Disjoint events are **mutually exclusive**. $P (A \cap B)$ is **empty**.

<img src="https://raw.githubusercontent.com/jirvingphd/dsc-conditional-probability-online-ds-ft-100719/master/images/Image_68Disjoint.png" width=30%>



### Dependent Events

**Events $A$ and $B$ are dependent when the occurrence of $A$ somehow has an effect on whether $B$ will occur (or not).**

<img src="https://raw.githubusercontent.com/learn-co-students/dsc-conditional-probability-onl01-dtsc-ft-070620/master/images/Image_69_Marb.png" width=50%>



## Conditional Probability - Definition

**Conditional probability emerges when the outcome a trial may influence the results of the upcoming trials (when we have dependent events)**


<img src="https://raw.githubusercontent.com/jirvingphd/dsc-conditional-probability-online-ds-ft-100719/master/images/Image_71_TreeDiag.png" width = 500>

$$ P (A \mid B) = \dfrac{P(A \cap B)}{P(B)}$$

$P(A|B)$, is the probability A **given** that $B$ has just happened. 

## Laws & Theorems Based on Conditional Probability


#### Theorem 1: Product Rule

The intersection of events $A$ and $B$ can be given by

\begin{align}
    \large P(A \cap B) = P(B) P(A \mid B) = P(A) P(B \mid A)
\end{align}




#### Theorem 2: Chain Rule AKA "General Product Rule"

- Allows calculation of any member of the joint distribution of a set of random variables using _only_ conditional probabilities.

- Builds upon the product rule: 
$$P(A \cap B) = P(A \mid B) P(B)$$

- When applied to 3 variables, becomes:


$$P(A\cap B \cap C) = P(A\cap( B \cap C))$$
$$ = P(A\mid B \cap C) P(B \cap C)$$
$$ = P(A \mid B \cap C) P(B \mid C) P(C)$$

And you can keep extending this to $n$ variables.



#### Theorem 3 - Bayes Theorem



The **Bayes theorem**, which is the outcome of this section. Below is the formula that we will dig deeper into in upcoming lessons.

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

- It uses that $P(A \cap B) = P(B) P(A \mid B) = P(A) P(B \mid A)$. 
- Note that, using Bayes theorem, you **can compute conditional probabilities without explicitly needing to know $P(A \cap B)$!** 


#### Additional note: the complement of an event
- Basic complements:
$$P(A) + P(A') = 1$$
with A' being the complement of A.

- Conditional Probability Complements

$$P(A|B) + P(A'|B) = 1$$

# LAW OF TOTAL PROBABILITY


<img src="https://raw.githubusercontent.com/jirvingphd/dsc-law-of-total-probability-online-ds-ft-100719/master/images/Image_55_TotProb.png" width=50%>

$$\large P(B)= \sum_i P(B \cap A_i)= \sum_i P(B \mid A_i)P(A_i)$$


- This law allows us to calculate $P(B)$ from partial/conditional probabilitie of subsets ($A_n$).
- Requires that the different $A$'s that make up sample space $S$ be disjointed events.


S $A_1, A_2, \dots, A_n$ partition sample space $S$ into disjoint regions that sum up to $S$.




<img src="https://raw.githubusercontent.com/learn-co-students/dsc-law-of-total-probability-onl01-dtsc-ft-070620/master/images/Image_56_vent.png" width=50%>





# ðŸ•¹ Activity Pt 3: Dinner Party Playlist - Conditional Probabilities

- Loading in the Data and Adding the Recommendations of a last minute invitee (Carla)

In [1]:
# !pip install -U fsds
from fsds.imports import *
from math import factorial

fsds v0.2.22 loaded.  Read the docs: https://fs-ds.readthedocs.io/en/latest/ 


Handle,Package,Description
dp,IPython.display,Display modules with helpful display and clearing commands.
fs,fsds,Custom data science bootcamp student package
mpl,matplotlib,Matplotlib's base OOP module with formatting artists
plt,matplotlib.pyplot,Matplotlib's matlab-like plotting module
np,numpy,scientific computing with Python
pd,pandas,High performance data structures and tools
sns,seaborn,High-level data visualization library based on matplotlib


[i] Pandas .iplot() method activated.


In [2]:
import os,glob
datafolder = "../probability_playlists/" ## student filepath
rec_files = glob.glob(datafolder+"*.csv")

playlists = {}
for file in rec_files:
    key = os.path.basename(file).replace('_recs.csv','')
    playlists[key] = pd.read_csv(file)
playlists.keys()

dict_keys(['joe', 'james', 'anne', 'john', 'samantha'])

In [3]:
# New Atendee Carla's Recs:
carla_recs = [['artist','track'],
              ['Cartman (South Park)','Poker Face'], 
              ['Nicki Minaj','Right By My Side'],
              ['Kelly Clarkson',"Since You've Been Gone"],
              ['Nicki Minaj',"Marilyn Monroe"],
              ['Kelly Clarkson',"Never Again"],
              ['Green Day',"Minority"]]

carla_recs = pd.DataFrame(carla_recs[1:],columns=carla_recs[0])
carla_recs['Recommended By'] ='Carla'
playlists['carla'] = carla_recs


df = pd.concat(playlists).reset_index(drop=True)
df

Unnamed: 0,artist,track,Recommended By
0,Green Day,Time of your Life,Joe
1,B-52s,Rock Lobster,Joe
2,Lady GaGa,Poker Face,Joe
3,John Lennon,Imagine,Joe
4,Eve 6,Here's to the Night,James
5,Neutral Milk Hotel,Into the Aeroplane Over the Sea,James
6,Rilo Kiley,With Arms Outstretched,James
7,Red Hot Chili Peppers,Otherside,James
8,Smashing Pumpkins,"Tonight, Tonight",Anne
9,Black Eyed Peas,Let's Get it Started,Anne


## Basic Probability Revisited

$$P(E) = |E| / |S|  $$

#### **Q: What is the probability of hearing "Let's Get it started"?**

In [4]:
## Get sample space
sample_space = df['track'].value_counts()
sample_space

Let's Get it Started               3
Poker Face                         3
Time of your Life                  2
Minority                           1
Marilyn Monroe                     1
Imagine                            1
Hallelujah                         1
With Arms Outstretched             1
Here's to the Night                1
Into the Aeroplane Over the Sea    1
Since You've Been Gone             1
Rock Lobster                       1
Just Dance                         1
Bad Romance                        1
Never Again                        1
Set Fire to the Rain               1
Otherside                          1
Right By My Side                   1
Tonight, Tonight                   1
Name: track, dtype: int64

In [5]:
## Get event space
E = sample_space["Let's Get it Started"]
E

3

In [6]:
S = sample_space.sum()
S

24

In [7]:
## Get Event Space
P_lets_get_it = E/S
P_lets_get_it

0.125

## Conditional Probabilities

$$ P (A \mid B) = \dfrac{P(A \cap B)}{P(B)}$$


#### Q: what is the probability of hearing the song "Poker Face" given that the song is by Lady GaGa?

- **What would be the formula to calculate $P(\text{PokerFace}|\text{LadyGaga})$  ?**

A:

$$ \large P(\text{PokerFace}|\text{LadyGaga}) =\frac{P(\text{Track:"Poker Face" & Artist:"Lady Gaga"})}{ P(\text{Lady GaGa})} $$

In [8]:
## Get Value Counts for all Lady GaGa tracks
sample_space = df.groupby('artist').get_group('Lady GaGa')['track'].value_counts()
sample_space

Poker Face     2
Bad Romance    1
Just Dance     1
Name: track, dtype: int64

In [9]:
E = sample_space['Poker Face']
E 

2

In [10]:
S = sample_space.sum()
S

4

In [11]:
E/S

0.5

#### Q: What is the prob of the song by Lady GaGa given that it is Poker Face?

In [12]:
## Show how groupby and value_counts gives same answer
poker_face_space = df.groupby('track').get_group('Poker Face')
sample_space = poker_face_space['artist'].value_counts()

E = sample_space['Lady GaGa']
S = sample_space.sum()
E/S


0.6666666666666666

# ðŸ•¹ Activity Pt 4: Law of Total House Party Playlists

- We've decided to be a little more adventurous and turn our dinner party into a larger house party.
    - The House party spread across 4 rooms that we assume people will spread their time evenly across the various rooms:
        - living room
        - basement
        - back patio
        - kitchen
        
- We have separate play lists playing at each location that were constructed with our dinner party recommendations.


#### OUR HOUSE PARTY & LAW OF TOTAL PROB
- Our House Party = space $S$
- The 4 rooms in the house are A1,A2,A3,A4
- B represents the probability of hearing a specific song or artist as you wander the house.

<img src="https://raw.githubusercontent.com/jirvingphd/dsc-law-of-total-probability-online-ds-ft-100719/master/images/Image_55_TotProb.png" width=50%>

$$P(B)= \sum_i P(B \cap A_i)= \sum_i P(B \mid A_i)P(A_i)$$




In [13]:
import os
os.makedirs('../probability_playlists/house_party/',exist_ok=True)

house_party = dict(living_room = df.sample(12,random_state=12).reset_index(drop=True).copy(),
                   basement = df.sample(10,random_state=321).reset_index(drop=True).copy(), 
                   back_patio = df.sample(9,random_state=42).reset_index(drop=True).copy(),
                  kitchen=df.sample(8,random_state=3210).reset_index(drop=True).copy())

## Save for later
folder = "../probability_playlists/house_party/"
for room,room_df in house_party.items():
    room_df.to_csv(f"{folder}/{room}.csv",index=False)
    

## Preview
for k,df_ in house_party.items():
    df_['Room'] = k 
    display(df_.style.set_caption(f"Playlist for {k}"))
    

Unnamed: 0,artist,track,Recommended By,Room
0,Cartman (South Park),Poker Face,Carla,living_room
1,Green Day,Minority,Carla,living_room
2,Red Hot Chili Peppers,Otherside,James,living_room
3,Lady GaGa,Just Dance,John,living_room
4,Green Day,Time of your Life,Anne,living_room
5,Lady GaGa,Bad Romance,John,living_room
6,Smashing Pumpkins,"Tonight, Tonight",Anne,living_room
7,Nicki Minaj,Marilyn Monroe,Carla,living_room
8,Green Day,Time of your Life,Joe,living_room
9,Black Eyed Peas,Let's Get it Started,Samantha,living_room


Unnamed: 0,artist,track,Recommended By,Room
0,Lady GaGa,Bad Romance,John,basement
1,Green Day,Time of your Life,Joe,basement
2,Kelly Clarkson,Never Again,Carla,basement
3,Panic at the Disco,Hallelujah,Samantha,basement
4,John Lennon,Imagine,Joe,basement
5,Lady GaGa,Poker Face,John,basement
6,Lady GaGa,Poker Face,Joe,basement
7,Green Day,Minority,Carla,basement
8,Red Hot Chili Peppers,Otherside,James,basement
9,Cartman (South Park),Poker Face,Carla,basement


Unnamed: 0,artist,track,Recommended By,Room
0,Smashing Pumpkins,"Tonight, Tonight",Anne,back_patio
1,Panic at the Disco,Hallelujah,Samantha,back_patio
2,Green Day,Time of your Life,Joe,back_patio
3,Cartman (South Park),Poker Face,Carla,back_patio
4,Black Eyed Peas,Let's Get it Started,John,back_patio
5,Black Eyed Peas,Let's Get it Started,Anne,back_patio
6,Lady GaGa,Bad Romance,John,back_patio
7,B-52s,Rock Lobster,Joe,back_patio
8,Nicki Minaj,Marilyn Monroe,Carla,back_patio


Unnamed: 0,artist,track,Recommended By,Room
0,Red Hot Chili Peppers,Otherside,James,kitchen
1,Cartman (South Park),Poker Face,Carla,kitchen
2,Nicki Minaj,Marilyn Monroe,Carla,kitchen
3,Nicki Minaj,Right By My Side,Carla,kitchen
4,Lady GaGa,Poker Face,John,kitchen
5,Rilo Kiley,With Arms Outstretched,James,kitchen
6,Lady GaGa,Poker Face,Joe,kitchen
7,Lady GaGa,Bad Romance,John,kitchen


## Q1: What is the probability of hearing a Green Day song at the house party at any given moment?

####  To Calculate $P(GD)$ for a Room

$$ P(\text{Green Day})=\sum_i P(\text{Green Day} \mid \text{Room}_i)P(\text{Room}_i)$$

- Q: **With our 4 rooms, what would our equation look like?**
    - What is the probability of being in each room?

- A:
$$P(\text{Green Day})= P(GD|Room1)\times \frac{1}{4} + P(GD|Room2)\times \frac{1}{4} + P(GD|Room3)\times \frac{1}{4} + P(GD|Room4)\times \frac{1}{4} $$

In [14]:
## Make a dictionary of prob of being in each room
room_probs = {'living_room':0.25, 
              'basement':0.25,
              'back_patio':0.25,
              'kitchen':0.25}#]
room_probs

{'living_room': 0.25, 'basement': 0.25, 'back_patio': 0.25, 'kitchen': 0.25}

In [15]:
## Lets do 1 example room - living room
room_df = house_party['living_room']

## Get room sample space
room_space = room_df['artist'].value_counts()

## Get P_gd_given_room
p_gd_given_room = room_space.loc['Green Day']/room_space.sum()
p_gd_given_room

0.25

In [16]:
## Multiply cond prob by prob being in the room
p_gd_given_room * room_probs['living_room']

0.0625

#### Let's turn that process into a function

In [17]:
def prob_event_given_room(room, sample_space='artist', 
                          event='Green Day',party_dict=house_party,
                          verbose=False):
    
    ## Pull out current room_df from dict
    room_df = party_dict[room]
    
    ## Define the sample space
    room_space = room_df[sample_space].value_counts()
    
    try:
        ## Calcualte Probability if event exists
        P = room_space.loc[event]/room_space.sum()
    
    except KeyError:
        ## Set prob=0 if event does not exist.
        P = 0
 
    ## Print the Prob event given room (if verbose==True)
    if verbose:
        print(f"P({event} | {room}) = {round(P,3)}")
    return P

In [18]:
## Let's try out the function for Green Day on the back patio

prob_event_given_room('back_patio',sample_space='artist',event='Green Day',
              verbose=True);

P(Green Day | back_patio) = 0.111


In [19]:
## Calculate Total Probability using a for loop
TOTAL_PROB = []

## Get room
for room in house_party.keys():
    
    ## Get conditional prob for room
    p_gd_given_room = prob_event_given_room(room,sample_space='artist',
                                    event='Green Day',verbose=True)
    
    ## Get the prob of being in that room
    p_room = room_probs[room]
    
    ## Append p_gd_given_room * p_room
    TOTAL_PROB.append(p_gd_given_room * p_room)
print()
print(f"P(Green Day) = {sum(TOTAL_PROB)}")

P(Green Day | living_room) = 0.25
P(Green Day | basement) = 0.2
P(Green Day | back_patio) = 0.111
P(Green Day | kitchen) = 0

P(Green Day) = 0.14027777777777778


In [20]:
## Checking against actual values
counts = pd.concat(house_party)['artist'].value_counts(normalize=True)
counts

Lady GaGa                0.230769
Green Day                0.153846
Cartman (South Park)     0.102564
Black Eyed Peas          0.102564
Nicki Minaj              0.102564
Red Hot Chili Peppers    0.076923
Panic at the Disco       0.051282
Smashing Pumpkins        0.051282
Kelly Clarkson           0.051282
B-52s                    0.025641
John Lennon              0.025641
Rilo Kiley               0.025641
Name: artist, dtype: float64

### Q1B: But wait...what if we have unequal probabilties for being in each room?



- The True prob of people being in each room is determined by what is going on in that room.
    - The snacks are in the kitchen (prob=0.4)
    - The drinks/bar is on the back patio (prob=0.3).
    - The living room and basement have no special amenities. (prob=0.15 each)

In [21]:
## Update house_party_room_odds
room_probs['kitchen'] = 0.4
room_probs['back_patio'] = 0.3
room_probs['living_room'] =0.15
room_probs['basement'] =0.15
room_probs

{'living_room': 0.15, 'basement': 0.15, 'back_patio': 0.3, 'kitchen': 0.4}

## Q2: What is the probability of hearing a Lady GaGa song at the house party at any given moment with the new room probabilities?

> The **easiest way** to do this would be to copy the for loop above and paste it below, then modify all of the variables to match the new question...

> So INSTEAD of that, **let's do it a better more programmatic way**: take the code we produced above to calculate TOTAL_PROB and lets turn it into a function.

In [22]:
## Paste your loop from Green Day question below for reference
TOTAL_PROB = []

for room in house_party.keys():
    
    p_gd_given_room = prob_event_given_room(room,sample_space='artist',
                                    event='Green Day',verbose=True)
    p_room = room_probs[room]

#     print(f"\tP({room})={p_room}")
    
    TOTAL_PROB.append(p_gd_given_room * p_room)
print()
print(f"P(Green Day) = {sum(TOTAL_PROB)}")

P(Green Day | living_room) = 0.25
P(Green Day | basement) = 0.2
P(Green Day | back_patio) = 0.111
P(Green Day | kitchen) = 0

P(Green Day) = 0.10083333333333333


In [23]:
## Now how can we make that process flexible?

def law_of_total_probability(room_prob,house_party, sample_space='artist',
                                        event='Green Day',verbose=True):
    TOTAL_PROB = []
    for room in house_party.keys():

        P_e_given_room = prob_event_given_room(room,sample_space=sample_space,
                                        event=event,verbose=verbose)
        p_room = room_probs[room]
        
        TOTAL_PROB.append(P_e_given_room * p_room)
    print()
    print(f"The Total Probability of P({event}) = {round(sum(TOTAL_PROB),3)}")
    return sum(TOTAL_PROB)

In [24]:
law_of_total_probability(room_probs,house_party,'artist','Lady GaGa')

P(Lady GaGa | living_room) = 0.167
P(Lady GaGa | basement) = 0.3
P(Lady GaGa | back_patio) = 0.111
P(Lady GaGa | kitchen) = 0.375

The Total Probability of P(Lady GaGa) = 0.253


0.25333333333333335

### Q: what is the probability of hearing a song recommend by Anne?

$$ P(AnneRec)=\sum_i P(AnneRec \mid Room_i)P(Room_i)$$
- Since we made our function flexible, we can easily calculate this. 


In [25]:
law_of_total_probability(room_probs,house_party,
                         sample_space='Recommended By', event='Anne')

P(Anne | living_room) = 0.25
P(Anne | basement) = 0
P(Anne | back_patio) = 0.222
P(Anne | kitchen) = 0

The Total Probability of P(Anne) = 0.104


0.10416666666666666

In [26]:
## Compare that to getting val counts from whole df
counts = pd.concat(house_party)['Recommended By'].value_counts(normalize=True)
counts

Carla       0.307692
John        0.205128
Joe         0.179487
Anne        0.128205
James       0.102564
Samantha    0.076923
Name: Recommended By, dtype: float64

# APPENDIX

### Derivation of Conditional Probability
Understanding this formula may be easier if you look at two simple Venn Diagrams and use the multiplication rule. Here's how to derive this formula:
<img src="https://raw.githubusercontent.com/jirvingphd/dsc-conditional-probability-online-ds-ft-100719/master/images/Image_72_Cond4.png" width="300">



Step 1: Write out the multiplication rule:
* $P(A \cap B)= P(B)*P(A\mid B)$

Step 2: Divide both sides of the equation by P(B):
* $\dfrac{P(A \cap B)}{ P(B)} = \dfrac{P(B)*P(A\mid B)}{P(B)}$

Step 3: Cancel P(B) on the right side of the equation:
* $\dfrac{P(A \cap B)}{P(B)} = P(A \mid B)$

Step 4: This is of course equal to:
* $ P(A \mid B)=\dfrac{P(A \cap B)}{P(B)} $

And this is our conditional probability formula. 

