# Intro to Probability
https://www.amazon.com/Introduction-Probability-Chapman-Statistical-Science/dp/1466575573

## Chap 4. Expectation

### Terms/Highlights

4 new distributions were introduced: 
- Geometric distribution
- First success distribution
- Negative binomial distribution
- Poisson distribution

The form of Poisson PMF is a bit like Taylor series. Think of it in your proof (if analytical sums appear).

This chapter introduced 3 ways to compute the expectation of r.v.s
- By **LOTUS**: sometimes require complicated analytical transformation.
- Via **indicator r.v.s**: This technique is based on
  - The *fundamental bridge* (betweek probability and expectation)
  - The *linearity of expectation*.
- Via **survival function** (think of CDF)

Connections between Poisson, Binomial and Hypergeometric distributions
- Taking limits: $HGeom \to Bin \to Pois$
- Conditioning: $Pois \to Bin \to HGeom$

P/s: What impressed me the most is the intermediate representation of a distribution. It helps break the problem into small pieces like *divide & conquer*. Particularly, despite looking trivial, indicator r.v.s bring the fundamental bridge as a power to tackle problems with less effort.

### More details
- **Geometric distribution**
  - Denotes the number of failures before a success.
  - $X \sim Geom(p)$
  - *PMF*: $P(X=k) = q^k p$
- **First success distribution**
  - Like geometric distribution, but include the success.
  - $X \sim FS(p)$. From the story $\implies X-1 \sim Geom(p)$
- **Negative Binomial distribution**
  - Denotes the number of failures before the $r^th$ success.
  - $X \sim NBin(r, p)$
  - *PMF*: $P(X=k) = {n+r-1 \choose r-1} p^r q^n$

|   | With replacement | Without replacement |
| --- | --- | --- |
| **Fixed number of trials** (n) | Binomial | Hypergeometric |
| **Fixed number of successes** (r) | Negative binomial | Negative hypergeometric |

- **Expectation**: Interchangeable terms: *expected value*, *expectation*, *mean*
  - $E(X) = \sum_{j=1}^{\infty} x_j P(X = x_j)$
- **Linearity** of expectation:
  - $E(X+Y) = E(X) + E(Y)$
  - Note: independence is not needed

- $NBin \leftrightarrow Geom$:
  - $X \sim NBin(n,p)$. Then $X = X_1 + ... + X_r$ where $X_j$ are i.i.d. $Geom(p)$.

- **Indicator r.v.s** are pretty helpful since they provide the **fundamental bridge** (a link between probability and expectation):
  - $P(A) = E(I_A)$
- Expectation via **survival function**
  - Survival function: $G(x) = 1-F(x) = P(X > x)$
  - $E(X) = \sum_{n=0}^{\infty} G(n)$
- **LOTUS** (Law of the unconscious statiscian)
  - $E(g(X)) = \sum_x g(x) P(X=x)$
  - "Unconscious" in the name refers to the replacement from $x$ to $g(x)$ in the definition :).

- **Poisson distribution**
  - Denotes the number of successes in a particular region/interval of time, and there are a large number of trials, each of which has a small probability of success.
  - $X \sim Pois(\lambda)$
  - *PMF*: $P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}$
  - $\lambda$ is interpreted as the *rate of occurrence*.
- **Poisson paradigm** (or the *law of rare events*)
  - The number of events $A_1, ..., A_n$ occurring with probabilities $p_1, ..., p_n$ could be approximated by $Pois(\lambda)$, where $\lambda = \sum_{j=1}^n p_j$. Conditions for this approximation:
    - $n$ is large, the $p_j$ are small.
    - The $A_j$ are independent or weakly dependent. (How weak? There are some measurements)
- Sum of independent Possions:
  - $X$ is indep. of $Y$, and $X \sim Pois(\lambda_1), Y \sim Pois(\lambda_2) \implies X+Y \sim Pois(\lambda_1+\lambda_2)$

Connections between Poissons and Binomials:
- Poisson approximation to Binomial:
  - If $X \sim Bin(n,p)$ and $n \to \infty, p \to 0$ s.t. $np$ converges to a constant $\lambda$, then the PMF of $X$ converges to $Pois(\lambda)$ PMF.
- Poisson given a sum of Poissons:
  - $X$ is indep. of $Y$. If $X \sim Pois(\lambda_1), Y \sim Pois(\lambda_2)$, then the conditional distribution of $X$ given $X+Y=n$ is $Bin(n, \lambda_1/(\lambda_1+\lambda_2))$



- **Variance**
  - Measures how spread out the data is.
  - $Var(X) = E(X-EX)^2 = E(X^2) - (EX)^2$
  - Note that variance is not linear, except the case of independence.

  