## Objectives
* Define probability
* Count
* Set up & solve a probability problem using:
 * Law of total probability
 * Conditional probabilities
 * Bayes' rule

# Probability
credit to Moses Marsh

## Propositional Logic
(a.k.a. [Propositional calculus](https://en.wikipedia.org/wiki/Propositional_calculus), sentential logic, zeroth-order logic) 

It deals with propositions (which can be true or false) and argument flow. 

Let's say I go to the deli and order "the special." I don't know what kind of sandwich it is, and it comes to me totally wrapped in paper.

Here are some statements I could make about this MYSTERY SANDWICH

A = "it is a turkey sandwich"

B = "there is mustard on this sandwich"


Let's review some boolean logic operators (T: True and F: False) in the truth table

|A  | B | A OR B | A AND B |
|-------|-------|:-------:|:-------:|
|T|T|T|T|
|T|F|T|F|
|F|T|T|F|
|F|F|F|F|

## Sets

Let $A$ and $B$ be two sets.

$x \in A$ says that "x is an element of the set A" 
$y \notin A$ says "y is not an element of the set A"


### symbology
* $\in$: in
* $\vee$ or
* $\wedge$ and
* $\neg$ not
* $\iff$ iff (if and only if)
* $\cap$ intersection
* $\cup$ union
* $\mid $ such that in set, or given than/condition on in probability
* $\emptyset$ empty set
* $\forall$ for all
* $\therefore$ therefore

### operations
[Venn Diagram](https://en.wikipedia.org/wiki/Venn_diagram)
* union:
    * $A\cup B = \{x\ |\ x \in A \vee x \in B\}$
* intersection:
    * $A \cap B = \{x\ |\ x \in A \wedge x \in B \}$
* difference:
    * $A \setminus B = \{x\ |\ x \in A \wedge x \notin B \}$
* complement:
    * $A^C = \{x\ |\ x \notin A \}$
* disjoint:
    * $A \cap B = \emptyset$
* partition (of $S$):
    * set of pairwise disjoint sets:
    * $\{A_i\}\  |\  S = \bigcup\limits_{i=1}^{N} A_{i}$

### DeMorgan's laws
* $ \neg (x \vee y) \iff \neg x \wedge \neg y $
* $ \neg (x \wedge y) \iff \neg x \vee \neg y $

## The Laws of Probability

Mathematically, ***probability*** is simply the act of assigning numbers to propositions like $A$ and $B$ above (with some constraints). Philosophically, we can interpret these numbers as ***measures of our certainty about the outcome of an event***.

Imagine we have the ***set of all possible outcomes*** $S$ for some process (for example, all the sandwiches on the menu at the deli). What are the rules for assigning numbers (probabilities) to those outcomes?

Let $x_i$ be the event "I got the $i$-th sandwich on the menu," and $P(x_i)$ be "the probability of getting the $i$-th sandwich on the menu"

Consider the statements A and B above. They are also events that will either be true or false, so we can define 

$P(A)$ as "the probability that statement A is true" or "event A happened".

Here are the ***[axioms of probability](https://en.wikipedia.org/wiki/Probability_axioms)***
- For any statement A, $P(A) \ge 0$
- $P(S) = 1$ as $S$ is the set of all possible events, the probability that _something_ in $S$ happens is 1
- For any statements A and B,
    -  $(A \cap B = \emptyset) \Longrightarrow P(A \cup B) = P(A) + P(B)$
    
Turns out that's all you need! Some useful consequences:
- $P(A^C) = 1 - P(A)$, sometimes you'll see $A^C$ written as $\neg A$ or $\sim A$
- $P(A \cap B) = P(A) + P(B) - P(A \cup B)$
- let $\{A_i\}_{i=1\cdots N}$ be a set of events satisfying
    - $A_i \cap A_j = \emptyset$ (each pair of events is mutually exclusive)
    - $\bigcup\limits_{i=1}^{N} A_{i} = S$ (all of the events together are collectively exhaustive of $S$)
    - then we say that $\{A_i\}_{i=1\cdots N}$ _partitions_ $S$, and $\sum_{i=1}^{N}P(A_i) = 1$

### conditional probability
  
a useful idea is conditional probability: if we know $A$ to be true, what is the probability of $B$?

$$P(B\mid A) = \frac{P(A \cap B)}{P(A)}$$

### independence 

if $P(A \cap B) = P(A)P(B)$ then we say that $A$ and $B$ are ***statistically independent***; sometimes written $ A \perp B $  

note that this also implies that 
$P(A\mid B) = P(A)$ and $P(B \mid A) = P(A)$

see sandwich example for more

# Combinatorics (counting)

### factorial

* $n! = \prod\limits_{i=1}^{n} i = 1 \times 2 \times 3 \times \ldots\ \times (n-1) \times n$
* $0! = 1$ by definition
* how many ways can we shuffle a deck of cards?


In [1]:
def factorial(n):
    if n==1:
        return 1
    return n*factorial(n-1)
#     result = 1
#     for i in range(1,n+1):
#         result *= i
#     return result

In [2]:
print(factorial(52))

80658175170943878571660636856403766975289505440883277824000000000000


In [3]:
'{:.3e}'.format(factorial(52))

'8.066e+67'

In [4]:
def display(n,code='.0f'):
    outstring = '{{:{}}}'.format(code)
    print(outstring.format(n))

In [5]:
display(factorial(52),'.3e')

8.066e+67


## permutations
number of ways of selecting subgroups when **order matters**
* select $k$ students in order from class of $n$
$$n\times (n-1) \times (n-2) \times \cdots \times (n-k+1) = \frac{n!}{(n-k)!}$$
* example: set batting order for 9 players on baseball team of 25

In [6]:
def permute(n,k):
    return factorial(n)/factorial(n-k)

In [7]:
display(permute(25,9))

741354768000


## combinations
 
An $k$-combination of elements of a set is an unordered section of $r$ elements from the set
 
 $${{n}\choose{k}} = \frac{n!}{(n-k)!k!}$$

In [8]:
def comb(n,k):
    return permute(n,k)/factorial(k)

In [9]:
display(comb(52,5))

2598960


### Exercise: possible poker hands
- 52 Cards in the deck 
- four suits: $\spadesuit$ (spades) <font color=red>$\heartsuit$ (hearts)</font> $\clubsuit$ (clubs) <font color=red>$\diamondsuit$ (diamonds)</font>
- 13 ranks: $\buildrel{Ace}\over{A}$, 2, 3, 4, 5, 6, 7, 8, 9, 10, $\buildrel{Jack}\over{J}$, $\buildrel{Queen}\over{Q}$, $\buildrel{King}\over{K}$
- there are 5 cards in one _hand_

<details>
<summary>1. Number of possible hands with a Four-of-a-Kind?</summary>
    $$13  \cdot 12 \cdot {4 \choose 1} = 624 $$
</details>


<details>
<summary>2. Hands with a Full House </summary>
 $$13\cdot {4\choose 3} \cdot 12\cdot{4\choose 2} = 3744 $$
</details>

<details>
<summary>3. Hands with Two Pairs </summary>
\[
13\cdot {4\choose 2} \cdot 12 \cdot {4\choose 2} \cdot 11 \cdot 4 \div 2 = 123552
\]
</details>

<details>
<summary>4. Hands with Every Suit (rainbow) </summary>
\[
{4\choose 1}{13\choose 2}13^3 = \frac{13^4\cdot 4\cdot 12}{2}
\]
</details>



## Law of total probability

Let $A$ be a subset of the total sample space $S$, i.e., $A \subset S$

If $\{B_i\}_{i= 1\cdots N}$ is a partition of $ A $, then 

$$ P(A) = \sum P(A_{}\cap B_i) = \sum P(A_{}\mid B_i) P(B_i)$$

And we call $A$ the marginal distribution of $B$

## bring it together

* we can reason about probability by carefully defining the sample space, and relevant subsets
* we can calculate probabilities by performing mathematical, often combinatoric, operations on these sets
* if you cannot properly determine the relevant space, it is not possible to define the probability

### question
Consider three fair coins in a bag {`HH`, `HT`, `TT`}. Draw a coin and flip it, you get heads, what is the probability of getting heads on a second flip?  

### solution

$$P(X_2 = H | X_1 = H) = \frac{P(X_2 = H \cap X_1 = H)}{P(X_1 = H)}$$
  
$P(X_2 = H \cap X_1 = H)$ is probability that $X_1 = H$ **and** $X_2 = H$  
if you grab `HH` coin two head flips have probability: 1  
but you could also grab `HT` coin and flip heads twice, probability:  $\frac{1}{2} \times\frac{1}{2} = \frac{1}{4}$  
each of those has probability $\frac{1}{3}$  
so $P(X_2 = H \cap X_1 = H) = \frac{1}{3} \times(1 + \frac{1}{4}) = \frac{5}{12}$  
finally $P(X_1 = H)$ is $\frac{1}{2}$  
$\therefore P(X_2 = H | X_1 = H) = \frac{\frac{5}{12}}{\frac{1}{2}} = \frac{5}{6}$


In [10]:
import random
coins = ['HH', 'HT','TT']
results = []
for i in range(10**5):
    coin = random.choice(coins)
    if random.choice(coin) == 'H': # first flip 
        results.append(random.choice(coin) == 'H') # second flip
heads = 1.*sum(results)/len(results)
print(heads)
print(len(results))

0.8326783494405903
49874


### question
what happens if the coins in the bag are {`HH`, `HT`}

In [11]:
coins = ['HH', 'HT']
results = []
for i in range(10**5):
    coin = random.choice(coins)
    if random.choice(coin) == 'H':
        results.append(random.choice(coin) == 'H')
heads = 1.*sum(results)/len(results)
print(heads)
print(1 - heads)

0.8327854014987156
0.1672145985012844


## chain rule

$ P(\bigcap\limits_i^n X_i) = \prod\limits_i^n P(X_i \mid \bigcap\limits_k^{i-1} X_k) $, for example, when $n = 3$, $P(X_1,X_2,X_3) = P(X_1|X_2,X_3)P(X_2|X_3)P(X_3)$  

what if $X_1,X_2,X_3$ are independent?

## Bayes' theorem
$$P(B\mid A) = \frac{P(A\mid B)P(B)}{P(A)}$$

[Medical Tests and Baye's Theorem](https://www.math.hmc.edu/funfacts/ffiles/30002.6.shtml)

Suppose that you are worried that you might have a rare disease. You decide to get tested, and suppose that the testing methods for this disease are correct 99% of the time (in other words, if you have the disease, it shows that you do (+) with 99% probability, and if you don't have the disease, it shows that you do not with 99% probability. Suppose this disease is actually quite rare, occurring randomly in the general population in only one of every 10,000 people.

If your test results come back positive, what are your chances that you actually have the disease?
<details>
<summary>Do you think it is approximately: (a) .99, (b) .90, (c) .10, or (d) .01?</summary>
 the answer is (d)
</details>

Knowing 

|conditional events | probability |
| --------- | ----------- |
| $ P(+ \mid \text{diseased})$ | .99 |
| $ P(+ \mid \text{healthy})$ | .01 |
| $P(\text{diseased})$ | .0001 |

what is $ P(\text{diseased}\mid +) $?

Applying Bayes' theorem, 
$$P(\text{diseased}\mid +) = \frac{P(+ \mid \text{diseased})P(\text{diseased})}{P(+)}$$  
but what is $P(+)$?

#### law of total probability to the rescue
\begin{align}
P(+) & = P(+ \mid \text{diseased})P(\text{diseased}) + P(+ \mid \text{healty})P(\text{healthy})\\ 
P(\text{healthy}) & = 1 - P(\text{diseased})\\
P(\text{diseased}\mid +) & = \frac{P(+ \mid \text{diseased})P(\text{diseased})}{P(+\mid \text{diseased})P(\text{diseased}) + P(+\mid \text{healthy})P(\text{healthy})}\\
& = \frac{0.99\times0.0001}{0.99\times0.0001 + 0.01\times0.9999} \approx 0.0098
\end{align}

## probability overview

* remember complements $P(A^C) = 1 - P(A)$
* are things independent?
* if conditional probabilites are involved, remember Bayes' rule
* probabilites cannot be > 1, check if your reasoning leads to that happening
* the law of total probability is your friend
