# Topic 11 Combinatorics and Probability

## Probability (Topic_11_1)

- Chance that a certain event will happen (how "likely" it is for something to happen)

## Introduction to Sets (Topic_11_2)

- **SET** - well-defined collection of distinct objects
    - "well-defined" - unambiguous, clear if given item is element of set
    - "distinct objects" - o or more unique objects
    - **Order does not matter**

- Set Definitions and Notations
    - **Semantic Definition** - $S$ blah blah blah - define using a worded definition
    - **Enumeration** $S = \{x,y,z\}$ - define using numbers inside curly braces
    
### Membership

Concerned whether or not an element belongs to a set since order doesn't matter and it can only appear once in a set.
- $x \in S$ - x is in S
- $x \notin S$ - x is not in S

In python:
- `x in S`
- `x not in S`

### Universal and Empty Sets

Use Omega ($\Omega$) when a set represents all possible outcomes.
- Example: A universal set of all dice outcomes
    - $\Omega = {1, 2, 3, 4, 5, 6}$
- Universal set can have infinite number of elements

Empty sets are denoted by `{}` or $ç$

### Working with Multiple Sets

#### Subsets

- Mathematical notation is $ X \subseteq Y $
- If you have two identical sets, they are subsets of each other
- Can check (and return true/false) in python using:
    - `x.issubset(y)`
    - `y.issubset(z)`
    - Note that the `.issubset` only works to find regular subsets, not proper subsets
- Alternative in Python
    - `x <= y`

#### Proper Subsets

- Proper subsets mean set X's values are all in Y's, **AND** Y has at least one additional element not in X
- Notation
    - $X \subset Y$
- Python asking if one set is a subset of another (will return T or F)
    - `X < Y`
    - `Y < Z`
- *All proper subsets are subsets, but not all subsets are proper subsets*

Pythonic notation helps clarify since `<=` implies the two sets can be equal for a regular subset but the `=` is omitted for a proper subset and is just `<`.

#### Superset

Supersets are the inverse of a subset, with a proper superset being when the two are not equal

- Mathematical Notation
    - $ X \supseteq Y $ - superset
    - $ Y \supset Z $ - proper superset
- Python
    - `X >= Y` or `.issuperset` (note `.issuperset` only works to find a regular superset, not a proper superset)
    - `Y > X`

### Core Set Operations

Lets assume the following:

- $S = \{3, 6, 9, 12\}$
- $T = \{2, 4, 6, 8\}$
- $\Omega = \{2, 3, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20\}$ - Universal Set for these exercises

#### Unions

Set of elements in S, T or both

Mathematical Notation
    - $S \cup T = \{2, 3, 4, 6, 8, 9, 12\}$
    
Pythonic
    - `S | T`
    - `S.union(T)`

#### Intersection of two sets
    
Set of elements that belong to both S and T

Mathematical notation
    - $S \cap T = \{6\}$

Pythonic
    - `S & T`
    - `S.intersection(T)`

#### Relative Complement or the Difference

- In general, **complement** is elements not in a set.  **Relative Complement** items of one set not in another set
- Relative Complement of S in T 
    - Mathematical notation
        - $T \backslash S = \{2, 4, 8\}$
        - $S \backslash T = \{3, 9, 12\}$
    - Pythonic
        - `T - S` or `T.difference(S)`
        - `S - T` or `S.difference(T)`

#### Absolute Complement

- All elements not in the universal set
- Absolute complement of S with respect to $\Omega$ is collection of objects in $\Omega$ that aren't in S
- Mathematical Notation 
    - $S'$ or $S^c$ = {2, 4, 8, 10, 14, 15, 16, 18, 20} (all elements in $\Omega$ not in $S$)
- Pythonic (same as relative complement)
    - `omega - S`

### Additional Set Attributes

#### Cardinality
- Cardinality of a set = number of elements
- Mathematical notation
    - $\mid S \mid$ = 4
- Pythonic
    - `len(s)`
    
#### Inclusion Exclusion Principle
- For two finite sets, method for counting number of elements in the union is:
    - $\mid S \cup T \mid = \mid S \mid + \mid T \mid - \mid S \cap T \mid$
    - Cardinality of S, plus the cardinality of T, minus the cardinality of the intersection between S and T
    
This also works for multiple sets:

$\mid S \cup T\cup R \mid = \mid S \mid + \mid T \mid + \mid R \mid - \mid S \cap T \mid  -\mid S \cap R \mid - \mid R \cap T \mid  + \mid S \cap T \cap R \mid $


<img src="images/Topic_11_02_new_venn_diagram.png" width="350"/>

### Sets in Python

- Sets are unordered collections of unique elements
- Sets are iterable
- Sets are collection of lower level python objects (just like lists or dictionaries)
- Some sets that can be represented with mathematical notation cannot be represented in Python

#### Set Operations in Python
|Operation                          |	Equivalent |	Result|
| ------                            | ------       | ------   |
|s.update(t)                        | 	$s \mid= t$ 	   |return set s with elements added from t|
|s.intersection_update(t)           | 	s &= t     |	return set s keeping only elements also found in t|
|s.difference_update(t)             |	s -= t 	   |return set s after removing elements found in t|
|s.symmetric_difference_update(t)   |	s ^= t 	   |return set s with elements from s or t but not both|
|s.add(x)                           |	           |	add element x to set s|
|s.remove(x)                        |	           |	remove x from set s|
|s.discard(x)                       |	           |	removes x from set s if present|
|s.pop()                            | 	           |	remove and return an arbitrary element from s|
|s.clear()            	            |  	           |remove all elements from set s|



### Sets and Set Operations Example

Set $A$ with all restaurants that serve Italian food.

Set $B$ with all the restaurants that serve burgers

**universal set**, $U$, contains all the restaurants in the world

#### Implications

- Union of these sets $C$, contains set of restaurants that serve either Italian food, burgers, or both.  $A$, and $B$ are both **subsets** of $C$

- The **intersection** of $A$ and $B$ contain the restaurants that *serve both Italian food and burgers*

- The cardinality of $C$, number of restaurants that serve Italian food ($A$) plus the number of restaurants that serve burgers ($B$) minus the intersection (that serve both burgers and italian).

- The **relative complement** of $A$ in $B$ is the restaurants that serve burgers but _not_ Italian food

- the **absolute complement** of $A$ is all the restaurants in the world that don't serve italian food (regardless of whether they serve burgers.

## Introduction to Probability (Topic_11_03)

### Terminology

Using throwing a 6 sided die for the following examples

#### Experiments and Outcomes

Throwing a die is the **random experiment** and the result of the experiment is the **outcome**.

#### Event

Outcome of a particular random experiment.  "Rolling the die and getting a 5" is an **event**

#### Sample Spaces

The universe of all possible outcomes is the **sample space**.  for a die roll that would be 1, 2, 3, 4, 5, and 6.

#### Event Spaces

Subset of the sample space that we "care about" is the **event space**.  If we are "rolling a number higher than a 4" our event space is 5 and 6.

### Sets and Probability

#### Sample Spaces as Sets

For the die roll, $S - \{1,,2,3,4,5,6\}$

- $S$ defines all possible outcomes
- $S$ is universal set $\Omega$

Other Sample Space Examples

##### Text Messages in a Day Sample Space
- $S$ equal to x, a non-neg integer
- Mathematically $ X \in \mathbb{Z}$, \mathbb{Z} is a special set representing all integers
- x non negative is represented as $x \geq 0$
- We need x to match both of these criteria, represent by **set builder** notation.  Vertical bar | means "such that" with conditions separated by commas
- $S = \{x \mid x \in \mathbb{Z}, x \geq 0\}$
##### TV Hours Sample Space
    - x is a **real number between 0 and 24**
    - Mathematically X being a real number is $x \in \mathbb{R}$, \mathbb{R} is a special set of all real numbers
    - x being between 0 and 24 looks like $0 \leq x \leq 24$
    - $S = \{x \mid x \in \mathbb{R}, 0 \leq x \leq 24\}$ - S contains all instances of x such that x is a real number and x is between 0 and 24
    
#### Event Spaces as Sets

- We will define event space as E.  $E \subseteq S$, E is a subset of S
- Example could be rolling a number higher than 4, $E = \{5,6\}$
- Another example if rolling an odd number, $E = \{1,3,5\}$
- E happens if actual outcome belongs to pre-defined event space $E$

Other examples of E based on our previous examples above:

##### Text Messages Event Space (Low number sent)
- Define as 20 or fewer text messages
- Event Space $E$ is, $E = \{x \mid x \mathbb{Z}, 0 \leq x \leq 20\}$

##### TV Hours (binge watch) Event Space
- Define as 6 or more hours watched
    - Event Space $E$ is, $E = \{x \mid x \in \mathbb{R}, 6 \leq x \leq 24\}$

### Introduction to Probability

#### Law of Relative Frequency

Endless experiments, relative frequency for an event will become a fixed number

- Say event $E$ 
- probability of event $E$ is $P(E)$.  
- $n$ is the number of experiments we conduct 
- $S(n)$ is the successful experiments (i.e., number of times that $n$ happened.  
- Represent Relative Frequency for this by:

$$ P(E) = \lim_{n\rightarrow\infty} \dfrac{S{(n)}}{n} $$

- Basis of a frequentist statistical interpretation: probability is the ratio of positive trials to the total number of trials as we repeat the process infinitely.

#### Probability Axioms

Early 1900s, Kolmogorov and Von Mises came up with three Axioms that expand on idea of probability:

##### 1. Positivity

Probability is always bigger than 0, or $0 \leq P(E) \leq 1$

##### 2. Probability of a Certain Event

If the event space equals the sample space (i.e., $E = S$), outcome is a certain event, or $P(E) = 1$

##### 3. Additivity

Probability of union of two exclusive events equals the sum of the individual events happening

[Inclusion-exclusion principle](#Inclusion-Exclusion-Principle) states (remember that $\mid S \mid$ statnds for cardinality or count):

$ \mid S \cup T \mid = \mid S \mid + \mid T \mid\ - \mid S \cap T \mid$

If $S \cap T$ is known to be $\emptyset$, then we can skip that and it is just $\mid S \mid + \mid T \mid$

This same logic will work for probability of events so long as the events are exclusive.

If $A \cap B = \emptyset$ then $P(A \cup B) = P(A) + P(B)$

#### Addition Law of Probability

The additivity axiom only works if events are exlusive.  If the event is not exclusive then we need to use the **Addition Law of Probability** which subtracts the intersection:

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$

In words, prob of A or B is the sum of A happening and B happening minus probability that both A and B will happen.

### Examples

#### Additivity of Exclusive Events

- Event M is throwing a 6
- Event N is an odd number
- What is the chance of throwing a 6 or odd number
- Use additivity rule since these events are exclusive
- $P(M \cup N) = P(M) + P(N)$
- $P(M \cup N) = \frac{1}{6} + \frac{3}{6} = \frac{4}{6} = \frac{2}{3}$

#### Addition Law of Probability

- Same Event N {1,3,5}
- New Event Q {4,5,6}

$P(N \cup Q) = P(N) + P(Q) - P(N \cap Q)$
$P(N \cup Q) = 3/6 + 3/6 - 1/6 = 5/6$

## Permutations and Factorials (Topic_11_4)

### Permutations

Number of permutations with n distinct objects is $n!$, or otherwise the factorial of $n$.

### Permutations of a Subset

8 options from which you must pick three:

8 options for the first, 7 options for the 2nd, 6 options for the third
8 * 7 * 6 = 336

$P_{k}^{n} = \dfrac{n!}{(n-k)!}$ for the permutation of selecting $k$ options from $n$ objects.

### Permutations with Repetition

What if some items are repeated?  i.e., making words out of the word TENNESSEE (can't swap the Ns, Ss, and Es to get different words

Take the factorial of all the options and then divide by the factorials of the number of times things are repeated:

9 total letters, E repeats 4 times, N twice and S twice:

$\dfrac{9!}{4!2!2!} = 3780$

General Formula:

$\dfrac{n!}{n_1!n_2!...n_k!}$

where k is the identical objects for type j.

### Recursion

$$ n! = n * (n-1)! = n * (n-1) * (n-2)! = ... = n * (n-1) * (n-2) * \ldots * 2! = n* (n-1) * (n-2) * \ldots * 2 * 1! $$ 

Recursive functions are functions that can call themselves until a condition is met.  

## Combinations (Topic_11_05)

Combinations are used when the order is not important.

In this case, with 3 letters you can create: ABC, ACB, BCA, BAC, CAB, CBA

In a combinatoin all of those would be considered the same so you must divide the permutation formula $P^n_k = \dfrac{n!}{(n-k)!}$ by the factorial of k ($k!$).  Therefore the Combination formula is:

$$\binom{n}{k} = \dfrac{P^n_k}{k!} = \dfrac{\dfrac{n!}{(n-k)!}}{k!}$$

When we use combinations, it will always be less than the permutation calculation for the same numbers since we're dividing. by $k!$.

## Conditional Probability (Topic_11_06)

### Events and Sample Space

- **Event** is the outcome of an experiment (e.g., getting a 3 when rolling a dice)
    - Can be a **compound event** (e.g., getting a 3 twice when rolling a dice twice)
- **Sample Space** is every possible outcome of a trial (typically noted by $\Omega$)

### Independent Events

- Two events are **independent** when the outcome of event A has no impact on the outcome of event B.
- Examples
    - Getting heads while flipping a coin and getting a 5 when rolling a dice
    - Choosing a marble from a container and getting a heads when flipping a coin
    
#### Two Independent Events

A and B are independent if:

- $P(A\cap B) = P(A)P(B)$

Probbility of A or B happening is represented by the **addition rule of probability**:

$$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

Thus by substitution, for two independent events:

$$P(A \cup B) = P(A) + P(B) - P(A)P(B)$$

#### Three Independent Events

A, B, and C are independent events if:

$P(A \cap B) = P(A)P(B)$

$P(A \cap C) = P(A)P(C)$

$P(B \cap C) = P(B)P(C)$

$P(A \cap B \cap C) = P(A)P(B)P(C)$

Independence must be both **pairwise** and **Three-Way**.

### Disjoint Events

A and B are disjointed events if A occuring means B can never occur (and vice-versa)

Disjoint events are **Mutually Exclusive**

$P(A \cap B)$ is **empty**

### Dependent Events

If the occurrence of A has an effect on the likelihood of the occurrence of B, the events are said to be **Dependent**

There are some really good images in the lecture for this (11_06).

Example:

- Suppose 5 marbles in a jar, 2 purple and 3 orange
- Event A is taking an orange or purple marble out of a jar
    - P(Purple) = 2/5 --> 1 Purple, 3 Orange left
    - P(Orange) = 3/5 --> 2 Purple, 2 Orange left
- Event B is taking another marble
    - P(Purple) if A(Purple) = 1/4
    - P(Orange) if A(Purple) = 3/4
    - P(Purple) if A(Orange) = 2/4
    - P(Orange) if A(Orange) = 2/4

**P(B) is conditional on the P(A)**

### Conditional Probability

**When the outcome of a trial will influence the results of upcoming trials**

While calculating P(B), we say that P(B) relies on the occurrence of event A.

Examples
- Drawing a second ace out of a deck after the first card drawn is an Ace
- Probability that someone likes "The Matrix" knowing that they like science fiction

Say we are interested in **P(A)** and it depends on an **event B** that has happened.

The conditional probability (or Probability of A given B) is written:

$$P(A \mid B) = \dfrac{P(A \cap B)}{P(B)}$$

#### Theorem 1 - Product Rule

Intersection of events A and B is given by:

$$P(A \cap B) = P(B)P(A \mid B) = P(A)P(B \mid A)$$

Remember that if A and B are **independet** then conditioning on B means nothing (and same for A), so:

$P(A \mid B) = P(A)$, therefore $P(A \cap B) = P(A)P(B)$

#### Theorem 2 - Chain Rule

Also known as the **General Product Rule**

If we start with the product rule:

$$P(A \cap B) = P(A \mid B)*P(B)$$

We can expand it to 3 variables

$$P (A \cap B \cap C) = P(A \cap (B \cap C)) = P(A \mid B \cap C)* P(B \cap C) = P(A \mid B \cap C)*P(B \mid C)*P(C)$$

And it could continue to be expanded to $n$ variables:

$$P(A_1 \cap A_2 \cap \ldots \cap A_n) = P(A_1 \mid A_2 \cap \ldots\cap A_n) P(A_2 \mid A_3  \cap \ldots \cap \ A_n) P(A_{n-1}|A_n) P(A_n)$$

If on the other hand you have disjoint events $C_1, C_2,...,C_m$ such that $C_1\cup C_2\cup ··· \cup  C_m = \Omega$, the probability of any event can be decomposed as:

\begin{align}
P(A) = P(A \mid C_1)P(C_1) + P(A \mid C_2)P(C_2) + \ldots + P(A \mid C_m)P(C_m)
\end{align}

#### Theorem 3 - Bayes Theorem

$$P(A \mid B) = \dfrac{P(B \mid A)P(A)}{P(B)}$$

Bayes Theorem follows from Theorem 1 above.

#### The Compliment of an Event

Complenet of an event is applicable to conditional probabilities:

$P(A) + P(A') = 1$

This is true if $A'$ is the compliment of A.

$P(A \mid B) + P(A' \mid B) = 1$

### Example


| &nbsp;    | Sunny weather  | Cloudy weather|
|-----------|----------------|---------------|
| Good mood | 14             | 11            |
| Bad mood  | 2              | 23            |

#### $P(G)$ If he picked a random day what is probability he was in a good mood?

Event Space = 25
Sample Space = 50
Probability = .5

#### $P(S)$ What is the probability the day chosen was Sunny?

Event Space Sunny = 14 + 2 = 16
Sample Space = 50
Probability = .32

#### $P(G \mid S)$ What is the probability of having a good day if it's a sunny day?

$P(G \mid S) = \dfrac{P(G \cap S)}{P(S)}$

- $P(G \cap S) = 14/50 = .28$
- $P(S) = .32$
- $P(G \mid S) = \dfrac{.28}{.32} = .875$

#### $P(S \mid G)$ What is the probability of it being a sunny day if in a good mood?

$P(S \mid G) = \dfrac{P(S \cap G}{P(G)}$

- $P(S \cap G) = 14/50 = .28$
- $P(G) = .5$
- $P(S \mid G) = \dfrac{.28}{.5} = .56$

### Additional Reading

[Conditional probability, Independence and Bayes rule](https://www.dropbox.com/sh/qkjgd4fbo2pwph6/AABOTSgNJHe7753i3bPCJUMqa/327_02_cond_probability.pdf?dl=0) - A deeper mathematical explanation around Independence and theorems we have seen above (and some we shall cover in upcoming lessons). If you are having trouble accessing that link, try [this one](https://web.archive.org/web/20191020193152/http://faculty.arts.ubc.ca/vmarmer/econ327/327_02_cond_probability.pdf) instead.

[Tree Diagrams](https://www.mathsisfun.com/data/probability-tree-diagrams.html) - Drawing tree diagrams to calculate conditional probability

[Conditional Probability, Examples and simple exercises](https://www.mathgoodies.com/lessons/vol6/conditional) - Practice with probability calculations

[Conditional probability: A visual explanation](http://setosa.io/conditional/) - A great little interactive animation to explain how conditional probability works



## Partitioning and the Law of Total Probability (Topic_11_07)

**Law of Total Proability** is a fundamental rule relating marginal probability to conditional probability.

### Partitioning a Sample Space

![](images/Image_55_TotProb.png)

B is a random event, but we cannot easily calculate the probability of B.  However, we can split up S into disjoint events $A_1$ through $A_4$ which sum up to S.

- $P(B) = P(B \cap A_1) + P(B \cap A_2) + P(B \cap A_3) + P(B \cap A_4)$

- $P(B) = P(B \mid A_1)P(A_1) + P(B \mid A_2)P(A_2) + P(B \mid A_3)P(A_3) + P(B \mid A_4)P(A_4)$

This uses our product rule (first theorem in 11_06) to find the combined probabilities. 

#### Example

- 4 distinct provinces in (disjoint regions) $A_1, A_2, A_3, and A_4)$
- We want total forest area, $B$ in the country
- We know forrest area in $A_1, A_2, A_3, and A_4$ is 100, 50, 150, 0 respectively.
- Total forrest area = 100 + 50 + 150 + 0 = 300

Translate that to law of total probability to find $B$, we can simply add up the probability of B that falls within each subdivided region of $S$, $A_1$ through $A_4$

#### Two Events

- For any two events A and B:
    - $P(A) = P(A \cap B) + P(A \cap B')$

- Conditional probability allows us to rewrite this as:
    - $P(A) = P(A \mid B)P(B) + P(A \mid B')P(B')$

- Law of Total Probability is a general version of this

### Law of Total Probability

$B_1, B_2,....B_i$ is a partition of S, for an event A:

$$P(A) = \sum_i P(A \cap B_i) = \sum_i P(A \mid B_i)P(B_i)$$

We use this when we want to find the probability of Event A when we can't easily determine that but we can determine the probability of $A$ given events $B_i$ where B_i forms a partition of the sample space.

# Topic 12 Statistical Distributions

## 12_01 - Introduction to Sampling 

Need sampling to be able to make statstical inferences about a population when the entire population cannot be observed.

### Census vs. Sample

- **Population** is the entire set in question.
- **Census** is observing every all data points (complete enumeration) - Like the US Census
- **Sample** is a subset of the population that we can calculate point estimates for and then extrapolate to make inferences about the entire population - like exit polling.

### Connection to Previous Concepts

#### Set Theory concepts we previously learned:
- **Population** is the **Universal Set** $\Omega$ of all possible things with a defined specification
- **Sample** is typically a proper subset ($\subset$, not $\subseteq$) meaning all elements of the sample are members of the population but the sample does not equal the population.

#### Descriptive statistics we previously learned:
- Measures of **Central Tendecy** (mean, median, mode) and measures of **Spread** (variance, std deviation) are used to describe distribution of given data points - histograms and box plots are used to show the shape of distribution
- These can be applied to populations or samples
    - If applied to populations - **population statistics**
    - If applied to sample - **point estimates**
- Use different symbols to represent the same statistic for population vs. sample:
    - Number
        - Pop - $N$
        - Samp - $n$
    - Mean
        - Pop - $\mu$ ("mu")
        - Samp - $\bar x$ ("x bar")
    - Standard Deviation
        - Pop - $\sigma$ ("sigma")
        - Samp - $s$ ("s"

### Sample with Titanic Dataset

- Obviously, our $\mu$ of age does not equal our $\bar x$ of age.
- Could quantify our estimate with the percentage of error.
- Then we could take 5 sample means created using a for loop and return a list of the means.
- Then use a list comprehension to create a list of the err percentages.

- As we increase the number of samples, we can see that the means center around the actual mean and if we compare the mean of the 1,000 sample means to the population mean we can see that it's extremely close.

## 12_02 - Statistical Distributions and Their Use Cases

Statistical Distribution - Representation of **frequencies of potential events**

**Probability Distribution** represents the following:
- Probabilities of sets of variables $X$ and sets of events $E$ such that $X \in E$ (fully written is $P(X \in E)$
- For a given x value, the probability that x belongs to the event.

Rules applying to probabilities also apply to probability distributions:

### 1. $P(X \in E) \in \mathbb {R}, P(X \in E) \geq 0$

- Probability that X belongs to E is a non-negative Real Number
- This corresponds to the positivity axiom from probability theory.

### 2. $P(X \in E) \leq 1, P(\Omega) = 1$

- Probability that X belongs to a certain E is less than or equal to 1, probability that some event within the sample space occuring is 1
- This corresponds to the uncertainty axiom (probability of a certain event)

### 3. $P(X \in \underset{i} \bigsqcup E_i = \underset{i} \Sigma P(X \in E_i)$ for any disjoint family of sets

- Probability that X belongs to the union of these disjoint (mutually exclusive) events equal to the sum of the probability that X belongs to each event.
- This corresponds to the additivity axiom (for mutually exclusive events). **In a probability distribution all events must be mutually exclusive**

### Discussion of Dice Distribution

- Each outcome is 1/6
- **Discrete distribution** is where the number of outcomes is finite and the outcome is a set of values.

### Discussion of Temperature Distribution

- Is representation of a **continuous distribution**
- This distribution has continuous values (i.e., temp can be 80, 80.5, 80.0034

### Common Distributions

Horizontal access represents the set of possible numeric outcomes.  Vertical access represents the probability of the respected outcome.

![](images/dists.png)

### Discrete vs. Continuous Distributions

- Discrete Distributions - use a **probability mass function (PMF)** (like our dice example)
- Continuous Distributions - use a **probability densidy function (PDF)** (like our temperature example)

Often distributions are described using their statistical mean (expected value) and their variance, but not always

#### Examples of Discrete Distributions

##### Bernouli Distribution

Represents the probability of success of a certain experiment.  There are only two possible values, _success or not_.  A coin toss is a classic Bernouli distribution with the probability of success is .5 or 50%.  A bernouli experiment can have any possibility of success between 0 and 1.

##### The Poisson Distribution

Probability of n events in a certain time period where the occurrence is constant.  A typical example is pieces of mail, visitors to a website, customers arriving to a store, clients waiting to be served in a queue.

##### The Uniform Distribution

When all possible outcomes are equally likely. Our dice example from before is a uniform distribution..  Dice example is a discrete uniform distribution however continuous uniform distributions exist as well.

#### Examples of Continuous Distributions

##### The Normal or Gaussian Distribution

Follows a bell shape. It is the fundamental distribution for many models and theories in data science.  This distribution turns up a lot when dealing with real world data like heights, weights, errors in some measurement, or grades on a test.





## 12_03 - The Probability Mass Function (PMF)

**Probability Mass Function** is a way to represent discrete distributions.  

### What is a Probability Mass Function (PMF)?

- Associates probabilities with discrete random variables (think coin flips and dice rolls). The **discrete** part comes from there being **a known number of possible outcomes**

- Based upon dice roll we can create a PMF showing the probabilities of each value between 1 and 6 occuring.

Formally:
> The Probability Mass Function (PMF) maps a probability ($P$) of observing an outcome $x$ of our discrete random variable $X$ in a way that this function takes the form $f(x) = P(X = x)$.

If $X$ is a discreet random variable we can say that $R_x$ is a countable set of all the values of $X$.

$R_x = \{x_1, x_2, x_3, \dots\}$ where the interior of the set is all of the possibilities of x.

We may be interested in quantifying the probability that $X$ is equal to $x_3$.  We want to know $P(x_3)$.  As an example, we are interested in the probability of getting a 3 on a die roll.  In that case it would be $P(x_3) = \dfrac{1}{6}$.

Think of the event $A$, such that  $A = \{ X = x_k \}$ is defined as the set of outcomes $s$ in the sample space $S$ for which the corresponding value of $X$ is equal to $x_k$.  This can be written as:

$$\large A = \{ s \in S \mid X(s) = x_k \}$$

(Remember that $s \in S$ is mathematical notation for "$s$ belongs to $S$" or "$s$ is in $S$"). 

### PMF Intuition

Steps to turn a variable's frequency into a probability:

1. Get the frequency of every possible value in the dataset (could use `collections.counter` (_Note: You can read more about the `collections` library [here](https://docs.python.org/3.6/library/collections.html)._)
2. Divide the frequency of each value by the total number of values (length of dataset)
3. Get the probability for each value

### Plotting PMF

- Looks very much like a histogram if we plot with a bar chart
- Normalized histogram (0-1 range on the y axis, means the counts have been divided by total number)
- If generating a histogram against a raw dataset you can normalize it by passing argument of `density=True`
- Histograms are typically used with continuous data to show frequencies, but we don't need to do this (can just use bar charts) since PMFs are discrete (categorical) data

### Measures of Central Tendency and Spread

Two descriptive quantities we'll likely be interested in:

- Mean (expected value)
- Variance (spread)

#### Expected Value

For discrete distributions, expected value of discrete randome value X is given by:

$$E(X) = \mu = \sum_i p(x_i)x_i$$

Similar to how we would normally calculate the mean.  Take \[2,5,5\] for example.  We would normally add 2 + 5 + 5 and divide by 3 to get 4.

Instead we calculate the probability first:
- $p(2) = \dfrac{1}{3}$
- $p(5) = \dfrac{2}{3}$

Now, multiply each probability by the value:

- $p(2) \cdot 2 = \dfrac{1}{3} \cdot 2 = \dfrac{2}{3}
- $p(5) \cdot 5 = \dfrac{2}{3} \cdot 5 = \dfrac{10}{3}

Then sum them:

$\dfrac{2}{3} + \dfrac{10}{3} = \dfrac{12}{3} = 4$

#### Variance

Variance is given by:

$$E((X-\mu)^2) = \sigma^2 = \sum_i p(x_i)(x_i - \mu)^2$$

Variance is sum of probabilities of each $x_i$ times the squared difference between that $x_i$ and the expected value ($\mu$). 

Just a minor re-ordering of the variance formula we have previously seen

### Biasing the PMF

Weight probability by the number of people who observe it. Then we will see the perceived expected value.

##


Questions for Chat/Office Hours
- What is the difference between using np.sum(arr) vs. arr.sum()???

## The Probability Density Function (PDF) (Topic_12_04)

## The Cumulative Distribution Function (Topic_12_05)

## Bernouli and Binomial Distribution (Topic_12_06)

## Statistical Distribution Lecture (Topic_12_07)

## The Normal Distribution (Topic_12_08)

## The Standard Normal Distribution (Topic_12_09)

## Skewness and Kurtosis (Topic_12_10)

## Statistical Distribution Lecture 2 (Topic_12_11)

[Latex cheat sheet](http://wch.github.io/latexsheet/latexsheet-1.png)

[Medium article on writing equations](https://medium.com/analytics-vidhya/writing-math-equations-in-jupyter-notebook-a-naive-introduction-a5ce87b9a214)

[Latex and Tex Primer](https://www.tug.org/begin.html)

- Superscript - $ ^{x} $ - ^{x} 
- Subscript - $ _{x} $ - \_{x}
- Fraction - $ \frac{x}{y} $ - \frac{x}{y}
- Square Root - $ \sqrt[n]{y} $ - \sqrt[n]{y}
- Sum - $ \sum_{i=1}^n $ - \\sum_{i=1}^n
- Product  - $ \prod_{i=1}^n $ - \prod_{i=1}^n
- Limit - $ lim_{x\rightarrow\infty} $ - \lim_{x\rightarrow\infty}

- Less than or equal to - $ \leq $ - \leq 
- Greater than or equal to - $ \geq $ - \geq 
- Not equal to - $ \neq $ - \neq 
- Approximately - $ \approx $ - \approx 
- Multiplication Symbol - $ \times $ - \times 
- Division Symbol - $ \div $ - \div 
- Plus/minus - $ \pm $ - \pm 


- Cdot - $ \cdot $ - \cdot 
- Cdots - $ \cdots $ - \cdots 
- Superscript Circle - $ *{\circ} $ - *{\circ} 
- Circle - $ \circ $ - \circ 
- Prime - $ \prime $ - \prime 
- Infinity - $ \infty $ - \infty 
- Negative - $ \neg $ - \neg 
- Wedge- $ \wedge $ - \ wedge 
- Vee - $ \vee $ - \vee 
- $ \rightarrow $ - \rightarrow - Thin right arrow
- $ \Rightarrow $ - \Rightarrow - Thick right arrow
- $ \leftrightarrow $ - \leftrightarrow - Thin leftright arrow


- Proper Superset - $ \supset $ - \supset
- Proper Subset - $ \subset $ - \subset
- Superset - $ \supseteq $ - \supseteq
- Subset - $ \subseteq $ - \subseteq
- Empty set - $ \emptyset $ - \emptyset
- Universal Set - $ \Omega $ - \Omega
- Cup - $ \cup $ - \cup
- Cap - $ \cap $ - \cap
- For all - $ \forall $ - \forall
- Exists - $ \exists $ - \exists - 
- In - $ \in $ - \in
- Not in - $ \notin $ - \notin
- Mathbb Z - $ \mathbb{Z} $ - \mathbb{Z}
- Backslash (relative subset) - $ \backslash $ - \backslash
- Mid - $ \mid $ - \mid 


- Dot a (or any other letter) - $ \dot a $ - \dot a 
- Hat a - $ \hat a $ - \hat a
- Bar a - $ \bar a $ - \bar a 
- Tilde a - $ \tilde a $ - \tilde a 












-  - $ \ $ - \
-  - $ \ $ - \
-  - $ \ $ - \
-  - $ \ $ - \
-  - $ \ $ - \
-  - $ \ $ - \
-  - $ \ $ - \
-  - $ \ $ - \
-  - $ \ $ - \
-  - $ \ $ - \
-  - $ \ $ - \
- $ \ $ - \ -
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $
- $ \ $