# probability

## objectives
By the end of this lesson, you will be able to:
* Define probability
* Count
* Set up & solve a probability problem using:
 * Law of total probability
 * Conditional probabilities
 * Bayes' rule

## Propositional Logic
(a.k.a [statement logic](https://en.wikipedia.org/wiki/Propositional_calculus), sentential logic, zeroth-order logic)

Let's say I go to the deli and order "the special". I don't know what kind of sandwich it is, and it comes to me totally wrapped in paper.

Here are some statements I could make about this MYSTERY SANDWICH

A = "it is a turkey sandwich"

B = "there is mustard on this sandwich"


Let's review some boolean logic operators (T means True and F means False):

|A  | B | A OR B | A AND B |A XOR B |
|-------|-------|:-------:|:-------:|:-------:|
|T|T|T|T|F|
|T|F|T|F|T|
|F|T|T|F|T|
|F|F|F|F|F|


## Sets

let $A$ and $B$ be sets of elements.

$x \in A$ says that "x is an element of the set A"

$y \notin A$ says "y is not an element of the set A"


### symbology
* $\in$: in
* $\vee$ or
* $\wedge$ and
* $\neg$ not
* $\iff$ iff (if and only if)
* $\cap$ intersection
* $\cup$ union
* $|$ such that
* $\emptyset$ empty set
* $\forall$ for all
* $\therefore$ therefore

### operations
* union:
    * $A\cup B = \{x\ |\ x \in A \vee x \in B\}$
* intersection:
    * $A \cap B = \{x\ |\ x \in A \wedge x \in B \}$
* difference:
    * $A \setminus B = \{x\ |\ x \in A \wedge x \notin B \}$
* complement:
    * $A^C = \{x\ |\ x \notin A \}$
* disjoint:
    * $A \cap B = \emptyset$
* partition (of S):
    * set of pairwise disjoint sets:
    * $\{A_i\}\  |\  S = \bigcup\limits_{i=1}^{N} A_{i}$

### DeMorgan's laws
* $ \neg (x \vee y) \iff \neg x \wedge \neg y $
* $ \neg (x \wedge y) \iff \neg x \vee \neg y $

## The Laws of Probability

Mathematically, ***probability*** is simply the act of assigning numbers to propositions like $A$ and $B$ above (with some constraints). Philosophically, we can interpret these numbers as ***measures of our certainty about the outcome of an event***.

Imagine we have the ***set of all possible outcomes*** $S$ for some process (for example, all the sandwiches on the menu at the deli). What are the rules for assigning numbers (probabilities) to those outcomes?

Let $x_i$ be the event "I got the $i^{th}$ sandwich on the menu", and $P(x_i)$ be "the probability of getting the $i^{th}$ sandwich on the menu"

Consider the statements A and B above. They are also events that will either be true or false, so we can define $P(A)$ as "the probability that statement A is true"

Here are the ***[axioms of probability](https://en.wikipedia.org/wiki/Probability_axioms)***
- For any statement A,
    - $P(A) \ge 0$
- $P(S) = 1$
    - since $S$ is the set of all possible events, the probability that _something_ in $S$ happens is 1
- For any statements A and B,
    -  $(A \cap B = \emptyset) \Rightarrow P(A \cup B) = P(A) + P(B)$
    
Turns out that's all you need! Some useful consequences:
- $P(A^C) = 1 - P(A)$
    - sometimes you'll see $A^C$ written as $\neg A$ or $\sim A$
- $P(A \cap B) = P(A) + P(B) - P(A \cup B)$
- let $\{A_i\}_{i=1\ldots N}$ be a set of events satisfying
    - $A_i \cap A_j = \emptyset$ (each pair of events is mutually exclusive)
    - $\bigcup\limits_{i=1}^{N} A_{i} = S$ (all of the events together are collectively exhaustive of S)
    - then we say that $\{A_i\}_{i=1\ldots N}$ _partitions_ $S$, and $\left(\sum_{i=1}^{N}P(A_i)\right) = 1$

### conditional probability
  
a useful idea (sometimes framed as an axiom) is conditional probability: if we know $A$ to be true, what is the probability of $B$?

$P(B|A) = \frac{P(A \cap B)}{P(A)}$

### independence 

if $P(A \cap B) = P(A)P(B)$ then we say that $A$ and $B$ are ***statistically independent***

sometimes written $ A \bot B $  

note that this also implies that 
$P(A|B) = P(A)$ and $P(B|A) = P(A)$

### see sandwich example for more

## combinatorics (counting)

## factorial

* $n! = \prod\limits_{i=1}^{n} i = 1 * 2 * 3 *\ \ldots\ *\ n-1\ *\ n$
* $0! = 1$ by definition
* how many ways can we shuffle a deck of cards?


In [1]:
def factorial(n):
    if n==1:
        return 1
    return n*factorial(n-1)
#     result = 1
#     for i in range(1,n+1):
#         result *= i
#     return result

In [2]:
print(factorial(52))

80658175170943878571660636856403766975289505440883277824000000000000


In [3]:
'{:.3e}'.format(factorial(52))

'8.066e+67'

In [4]:
def display(n,code='.0f'):
    outstring = '{{:{}}}'.format(code)
    print(outstring.format(n))

In [5]:
display(factorial(52),'.3e')

8.066e+67


## permutations
number of way of selecting subgroups when order matters
* select k students in order from class of n
$${n!}\over{(n-k)!}$$
* example: set batting order for 9 players on baseball team of 25

In [6]:
def permute(n,k):
    return factorial(n)/factorial(n-k)

In [7]:
display(permute(25,9))

741354768000


## combinations
 
 
 
 $${{n}\choose{k}} = {{n!}\over{(n-k)!k!}}$$

In [8]:
def comb(n,k):
    return permute(n,k)/factorial(k)

In [9]:
display(comb(25,9))

2042975


## law of total probability

Let $A$ be a subset of the total sample space $S$.

If $\{B_n\}$ is a partition of a sample subspace $ A $

Then 

$ P(A) = \sum P(A\cap B_i) $

or

$ P(A) = \sum P(A|B_i) P(B_i)$

And we call A the marginal distribution of B

## bring it together

* we can reason about probability by carefully defining the sample space, and relevant subsets
* we can calculate probabilities by performing mathematical, often combinatoric, operations on these sets
* if you cannot properly determine the relevant space, it is not possible to define the probability

## question
three coins in a bag: {HH,HT,TT}  
you draw a coin and flip it, getting heads  
what is the probability of getting heads on a second flip?  

## solution


  
$P(X_2 = H | X_1 = H) = \frac{P(X_2 = H \cap X_1 = H)}{P(X_1 = H)}$
  
  
$P(X_2 = H \cap X_1 = H)$ is probability that $X_1 = H$ **and** $X_2 = H$  
if you grab HH coin two head flips have probability: 1  
but you could also grab HT coin and flip heads twice, probability:  $\frac{1}{2} * \frac{1}{2} = \frac{1}{4}$  
each of those has probability $\frac{1}{3}$  
so $P(X_2 = H \cap X_1 = H) = \frac{1}{3} * (1 + \frac{1}{4}) = \frac{5}{12}$  
finally $P(X_1 = H)$ is $\frac{1}{2}$  
$\therefore P(X_2 = H | X_1 = H) = \frac{\frac{5}{12}}{\frac{1}{2}} = \frac{5}{6}$


In [10]:
import random
coins = ['HH', 'HT','TT']
results = []
for i in range(100000):
    coin = random.choice(coins)
    if random.choice(coin) == 'H':
        results.append(random.choice(coin) == 'H')
heads = 1.*sum(results)/len(results)
print(heads)
print(len(results))

0.830954424769928
50093


## question
what happens if the coins in the bag are {HH,HT}

In [11]:
coins = ['HH', 'HT']
results = []
for i in range(100000):
    coin = random.choice(coins)
    if random.choice(coin) == 'H':
        results.append(random.choice(coin) == 'H')
heads = 1.*sum(results)/len(results)
print(heads)
print(1 - heads)

0.8336534357661118
0.16634656423388816


## chain rule

$ P(\bigcap\limits_i^n X_i) = \prod\limits_i^n P(X_i | \bigcap\limits_k^{i-1} X_k) $  
$P(X_1,X_2,X_3) = ?$  

$P(X_1|X_2,X_3)P(X_2|X_3)P(X_3)$

### what if $X_1,X_2,X_3$ are independent?

## Bayes' rule
  
  
$P(B|A) = \frac{P(A|B)P(B)}{P(A)}$

|conditional events | probability |
| --------- | ----------- |
| $ P(+\ ∣ \ doped)$ | .99 |
| $ P(+\ ∣ \ clean)$ | .05 |
| $P(doped)$ | .005 |

what is $ P(doped\ |\ +) $?


$P(doped\ |\ +) = \frac{P(+\ |\ doped)P(doped)}{P(+)}$  
but what is $P(+)$?

## law of total probability to the rescue

$P(+) = P(+\ |\ doped)P(doped) + P(+\ |\ clean)P(clean)$  
$P(clean) = 1 - P(doped)$  

$P(doped\ |\ +) = \frac{P(+\ |\ doped)P(doped)}{P(+\ |\ doped)P(doped) + P(+\ |\ clean)P(clean)}$ 


$P(doped\ |\ +) = \frac{0.99*0.005}{0.99*0.005 + 0.05 * 0.995} = 0.090$

## probability overview

* remember complements $P(A^C) = 1 - P(A)$
* are things independent?
* if conditional probabilites are involved, remember Bayes' rule
* probabilites cannot be > 1, check if your reasoning leads to that happening
* the law of total probability is your friend
