# Probability

### Natalie Hunt
#### March 19, 2018

## Objectives

* Explain the difference between probability and statistics.
* Use permutations and combinations to solve probability problems.
* Use basic laws of probability.
* Understand what a random variable is and derive common properties.
* Recognize common probability distributions, including Bernoulii, binomial, geometric, Poisson, uniform, normal, and exponetial.

## Agenda

Morning

 * What is probability?
 * Review sets
 * Discuss permutations and combinations
 * Discuss laws of probability
 
Afternoon

 * Discuss random variables
 * Discuss common distributions

In [20]:
import scipy.stats as scs
import numpy as np
import math

import matplotlib.pyplot as plt
%matplotlib inline

# Morning Lecture – Introduction to Probability

## Probability and Statistics

Probability and statistics are closely related disciplines, though in some ways the opposite of each other.

In **probability**, we have have a model of how some part of the world behaves. We use this rules to determine how probable it is that certain events occur.

In **statistics** we see the events that actually occured, and we use them to try to infer a model.

### Some definitions and notation...

* A set $S$ consists of all possible outcomes or events and is called the sample space
* Union: $A \cup B = \{ x: x \in A ~\mathtt{ or} ~x \in B\}$
* Intersection: $A \cap B = \{x: x \in A ~\mathtt{and} ~x \in B\}$
* Complement: $A^\complement = \{ x: x \notin A \}$
* Disjoint: $A \cap B = \emptyset$
* Partition: a set of pairwise disjoint sets, ${A_j}$, such that $\underset{j=1}{\overset{\infty}{\cup}}A_j = S$
* $\left|A \right| \equiv$ number of elements in $A$

### ...and some laws

* DeMorgan's laws: $(A \cup B)^\complement = A^\complement \cap B^\complement$ and  $(A \cap B)^\complement = A^\complement \cup B^\complement$
* Commutative Laws
  * $A \cup B = B \cup A$
  * $A \cap B = B \cap A$
* Associative
  * $A \cup (B \cup C) = (A \cup B) \cup C$
  * $A \cap (B \cap C) = (A \cap B) \cap C$
* Distributive laws
  * $A \cup (B \cap C) = (A \cup B) \cap (A \cup C)$
  * $A \cap (B \cup C) = (A \cap B) \cup (A \cap C)$

## Permutations and Combinations

In general, there are $n!$ ways we can order $n$ objects, since there are $n$ that can come first, $n-1$ that can come 2nd, and so on. So we can line 16 students up $16!$ ways.

In [21]:
math.factorial(16)

20922789888000

Suppose we choose 5 of you at random from the class of 16 students. How many different ways could we do that?

If the order matters, it's a **permutation**. If the order doesn't, it's a **combination**.

There are $16$ ways they can choose one student, $16 \cdot 15$ ways we can choose two, and so on, so $$16\cdot15\cdot14\cdot13\cdot12 = \frac{16!}{11!} = {_{16}P_{5}}$$ ways we can choose five students, assuming the order matters. In general

$$_nP_k = \frac{n!}{(n-k)!}$$

In [22]:
def permutations(n, k):
    return math.factorial(n)/math.factorial(n-k)

permutations(16,5)

524160.0

There are $5!$ different way we can order those different students, so the number of combinations is that number divided by $5!$. We write this as $${16 \choose 5} = \frac{16!}{11! \cdot 5!}$$

In general,

$${n \choose k} = {_nC_k} = \frac{n!}{(n-k)!\cdot k!}$$

In [23]:
def combinations(n, k):
    return math.factorial(n)/(math.factorial(n-k) * math.factorial(k))

combinations(5,2)

10.0

## Multinomial

Combinations explain the number of ways of dividing something into two catagoies. When dividing into more categories, use

$${n \choose {n_1, n_2, ... n_k}} = \frac{n!}{n_1! n_2! ... n_k!}$$

which reduces to the above for two cases.

## Definition of probability

Given a sample space S, a *probability function* P of a set (of outcomes) has three properties.

* $P(S) = 1$
* $P(A) \ge 0 \; \forall \; A \subset S$
* For a set of pairwise disjoint sets $\{A_j\}$, $P(\cup_j A_j) = \sum_j P(A_j)$

With problems involving permutations and combinations, **every outcome is equally likely**, so the probability of some event is the number of ways it can happen divided by the number of possible outcomes.

### Tea-drinking problem

There's a classic problem in which a woman claims she can tell whether tea or milk is added to the cup first. The famous statistician R.A. Fisher proposed a test: he would prepare eight cups of tea, four each way, and she would select which was which.

Assuming the null hypothesis (that she was guessing randomly) what's the probability she guess all correctly?

## Independence

Two events $A$ and $B$ are said to be *independent* iff 

$$ P(A \cap B) = P(A) P(B)$$

or equivalently

$$ P(B \mid A) = P(B)$$

so knowlege of $A$ provides no information about $B$. This can also be written as $A \perp B$.

### Example: dice

The probability of rolling a 1 on a single fair 6-sided die is $1\over 6$.

What's the probability of two dice having a total value of 3?

# Bayes' theorem

Bayes' theorem says that

$$P(A\mid B) = \frac{P(B\mid A) P(A)}{P(B)}$$
Where A and B are two possible events.

To prove it, consider that


$$\begin{equation}
\begin{aligned}
P(A\mid B) P(B) & = P(A \cap B) \\
            & = P(B \cap A) \\
            & = P(B\mid A) P(A) \\
\end{aligned}
\end{equation}
$$

so dividing both sides by $P(B)$ gives the above theorem.

In here we usually think of A as being our hypothesis, and B as our observed data, so

$$ P(hypothesis \mid data) = \frac{P(data \mid hypothesis) P(hypothesis)}{P(data)}$$

where
$$ P(data \mid hypothesis) \text{ is the likelihood} \\
P(hypothesis) \text{ is the prior probability} \\
P(hypothesis \mid data) \text{ is the posterior probability} \\
P(data) \text{ is the normalizing constant} \\
$$

The normalizing constant is the difficult one to calculate. In general, we have to use something called the **Law of Total Probability**.

### Law of Total Probability

If {A_n} is a partition of all possible options, then

$$\begin{align}
P(B) & = \sum_i P(B \cap A_i) \\
     & = \sum_i P(B \mid A_i) \cdot P(A_i)
\end{align}
$$

So if $B$ is the observed data and $A_i$ are all the possible hypotheses, we can use this to calculate the normalizing constant.

### Example: the cookie problem

Bowl A has 30 vanilla cookies and 10 chocolate cookies; bowl B has 20 of each. You pick a bowl at random and draw a cookie. Assuming the cookie is vanilla, what's the probability it comes from bowl A?

### Example: two-sided coins

There are three coins in a bag, one with two heads, another with two tails, another with a head and a tail. You pick one and flip it, getting a head. What's the probability of getting a head on the next flip?

## Probability chain rule


$$\begin{align}
P(A_n, A_{n-1}, ..., A_1) & = P(A_n \mid A_{n-1},...,A_1) \cdot P(A_{n-1},...,A_1) \\
 & = P(A_n \mid A_{n-1},...,A_1) \cdot P(A_{n-1} \mid A_{n-2},...,A_1) \cdot P(A_{n-1},...,A_1) \\
 & = \prod_{j=1}^n P(A_j \mid A_{j-1},...,A_1)
\end{align}
$$