## Conditional Probability
- *If I have two events that depend on each other, what's the probability that both will occur? *
- **Notation**: $P(A,B)$ is the probability of A and B both occuring
- $P(B | A)$: Probability of B given that A has occured
- We know: $P(B | A) = \dfrac{P(A,B)}{P(A)}$

## For example
- There are two tests. 60% of students passed both tests, but the first test was easier - 80% passed that one. What percentage of students who passed the first test also passed the second?
- A = passing the first test, B = passing the second test
- So we are asking for $P(B|A)$ - the probability of B given A
- $P(B|A) = P(A,B) \div P(A) = 0.6 \div 0.8 = 0.75$
- 75% of students who passed the first test passed the second

There are some data on how much stuff people purchase given their age range.
It generates 100,000 random "people" and randomly assigns them as being in their 20's, 30's, 40's, 50's, 60's, or 70's.
It then assigns a lower probability for young people to buy stuff.
In the end, we have two Python dictionaries:
- "totals" contains the total number of people in each age group.
- "purchases" contains the total number of things purchased by people is each age group.
---
The grand total of purchases is in totalPurchases, and we know the total number of people is 100,000:

In [5]:
import numpy as np
from tqdm import tqdm
np.random.seed(0)
totals = {20:0, 30:0, 40:0, 50:0, 60:0, 70:0}
purchases = {20:0, 30:0, 40:0, 50:0, 60:0, 70:0}
totalPurchases = 0
for _ in tqdm(range(100000)):
    ageDecade = np.random.choice([20, 30, 40, 50, 60, 70])
    purchaseProbability = float(ageDecade) / 100.0
    totals[ageDecade] += 1
    if np.random.random() < purchaseProbability:
        totalPurchases += 1
        purchases[ageDecade] += 1



In [6]:
totals

{20: 16576, 30: 16619, 40: 16632, 50: 16805, 60: 16664, 70: 16704}

In [7]:
purchases

{20: 3392, 30: 4974, 40: 6670, 50: 8319, 60: 9944, 70: 11713}

In [8]:
totalPurchases

45012

#### First let's compute $P(E|F)$, where E is "purchase" and F is "you're in your 30's". The probability of someone in their 30's buying something is just the percentage of how many 30-year-olds bought something:

In [10]:
PEF = float(purchases[30])/float(totals[30])
print("P(purchase | 30s): %F" % PEF)

P(purchase | 30s): 0.299296


__P(F) is just the probability of being 30 in this data set:__

In [12]:
PF = float(totals[30]) / 100000.0
print("P(30's): {0}".format(PF))

P(30's): 0.16619


##### And $P(E)$ is the overall probability of buying something, regardless of your age:

In [14]:
PE = float(totalPurchases) / 100000.0
print("P(Purchase):", PE)

P(Purchase): 0.45012


**If E and F were independent, then we would expect $P(E|F)$ to be about the same as $P(E)$. But they're not; $P(E)$ is 0.55, and $P(E|F)$ is 0.7. So, that tells us that E and F are dependent (which we know they are in this example.)
What is P(E)P(F)**?

In [17]:
print("P(30's)P(Purchase) "+str(PE*PF))

P(30's)P(Purchase) 0.07480544280000001
