# Statistics

Table of Contents

- [Empirical distribution](#empirical)
- [Confidence interval](#conf)
- [Frequentist statistics](#freq)
 - [Maximum likelihood estimation](#mle)
 - [Expectation and maximization algorithm](#em)
- [Bayesian statistics](#bayesian)
- [Bootstrap](#bootstrap)
- [Summary](#summary)
- [References](#references)

## Empirical distribution <a name='empirical'></a>
**Empirical distribution**   
Definition  
Let $X_1,\ldots,X_n$ i.i.d., $F(t) = p(x_i\leq t)$. The empirical distribution function (empirical cumulative distribution function), is defined as
$$\hat{F}_n(t) = \frac{1}{n}\sum_{i=1}^n 1_{(X_i\leq t)}$$
where $1$ is the indicator function. Note that for a fixed $t$, $F(t)$ is a constant and $\hat{F}_n(t)$ is a function of $X_1,\ldots,X_n$.  

- Pointwise convergence (for fixed t)  
  - Hoeffding's inequality implies that for any $\epsilon > 0$, 
  $$p(\left|\hat{F}_n(t) - F(t)\right| \geq \epsilon) \leq 2e^{-2n\epsilon^2},$$ 
  so $$\hat{F}_n(t) \buildrel p \over \rightarrow F(t).$$
  - Note that $\mathbb{E}[|1_{(X_i\leq t)}|]<\infty$ and $\mathbb{E}[1_{(X_i\leq t)}] = F(t)$, strong law of large numbers implies that 
  $$\hat{F}_n(t) \buildrel a.s. \over \rightarrow F(t).$$
- Uniformly convergence
  - Glivenko-Cantelli Lemma. The empirical distribution converges uniformly to $F(x)$, namely 
  $$\sup_{t \in \mathbb{R}} \left|\hat{F}_n(t) - F(t)\right| \buildrel a.s. \over \rightarrow 0.$$
  - Dvoretzky-Kiefer-Wolfowith (DKW) inequality. For any $\epsilon > 0$ and any $n > 0$
  $$p(\sup_{x\in \mathbb{R}}\left|\hat{F}_n(t) - F(t)\right|\geq \epsilon) \leq 2e^{-2n\epsilon^2}.$$
  
**Empirical measure** (probability)  
The empirical measure $p_n$ is defined by
$$\hat{p}_{n}(A)=\frac{1}{n}\sum_{i=1}^{n} I_{(X_i\in A)}=\frac{1}{n}\sum _{i=1}^{n}\delta_{X_i}(A)$$
where $I_{A}$ is the indicator function and $\delta_{X}$ is the Dirac measure.  

Given a random sample, there may be repeated values, we can express empirical measure 
as **relative frequency**
$$p_{i}={\frac{n_i}{N}}={\frac{n_i}{\sum_{j}n_j}}.$$
- It can be proved that relative frequencies is a sufficient statistic for the true distribution.
It means all the information about the true distribution in a sample is also contained in the relative frequencies. 

**Plug-in estimator**  
The plug-in estimate of a parameter $\theta = T(p)$ is defined to be $\hat{\theta}_n = T(\hat{p}_n)$.  
Examples
Given a random sample $x_1,\ldots, x_n$,
- The plug-in estimator of mean $\mu$,
$$\hat{\mu} = \mathbb{E}_{\hat{p}_n}[X] = \frac{1}{n}\sum_{i=1}^n x_i.$$
It is unbiased and consistent.
- The plug-in estimator of variance $\sigma^2$,
$$\hat{\sigma}^2 = \mathbb{E}_{\hat{p}_n}[(X-\mathbb{E}_{\hat{p}_n}[X])^2]=\frac{1}{n}\sum_{i=1}^n (x_i-\bar{x})^2.$$
It is biased and consistent. The sample variance
$$\frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2$$
is unbiased.

## Confidence interval <a name='conf'></a>
[//]: # (comment) 
**Definition**     
Let $X_1,\ldots,X_n$ be a sample on $X\sim f(x;\theta), \theta\in\Omega$.
Let $L=L(X_1,\ldots,X_n)$ and $U=U(X_1,\ldots,X_n)$ be two statistics and $0<\alpha<1$.
The interval $(L,U)$ is a $(1-\alpha)100\%$ confidence interval for $\theta$ if
$$1-\alpha=p_{\theta}(\theta\in(L,U)).$$

Some examples  
Let sample mean $\overline{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ and
sample variance $S_n^2 = \frac{1}{n-1}\sum_{i=1}^n(X_i-\overline{X})^2$. 
- Confidence interval under normality  
$X_1,\ldots,X_n$ i.i.d. $X_i\sim N(\mu,\sigma^2)$,
  - confidence interval for $\mu$ when $\sigma^2$ is known
  We know $T = \frac{\overline{X}_n-\mu}{\sigma/\sqrt{n}} \sim N(0,1)$,
  $$ 1-\alpha = p\left(\overline{X}_n - z_{\alpha/2} \frac{\sigma}{\sqrt{n}} < \mu < \overline{X}_n + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right).$$
  - Confidence interval for $\mu$ when $\sigma^2$ is unknown   
  We know $T = \frac{\overline{X}_n-\mu}{S_n/\sqrt{n}} \sim t(n-1)$,
  $$1-\alpha = p\left(\overline{x}_n-t_{\alpha/2,n-1} \frac{S_n}{\sqrt{n}}<\mu<\overline{x}_n+t_{\alpha/2,n-1} \frac{S_n}{\sqrt{n}}\right).$$

- Large sample confidence interval
  - Large sample confidence interval for $\mu$  
  $X_1,\ldots,X_n$ iid, $X_i$ has mean $\mu$ and variance $\sigma^2$. We know $\frac{\overline{X}_n-\mu}{S/\sqrt{n}} \buildrel D\over\rightarrow Z=N(0,1)$, 
$$1-\alpha \approx p\left(\overline{x}-z_{\alpha/2} \frac{S_n}{\sqrt{n}}<\mu<
\overline{x}+z_{\alpha/2} \frac{S_n}{\sqrt{n}}\right).$$
  - Large sample confidence interval for $p$ of $\text{ber}(p)$  
$X_1,\ldots,X_n$ iid, $X_i\sim \text{ber}(p)$. Note that $\mathbb{E}[X_i]=p$ and $\text{Var}[X_i]=p(1-p)$.
Let $\hat{p}_n=\overline{X}$, by CLT,
$$\frac{\hat{p}-p}{\sqrt{p(1-p)/n}} \buildrel D\over\rightarrow Z=N(0,1).$$ 
Since $\hat{p}_n \buildrel D\over\rightarrow p$, we replace $p$ with $\hat{p}_n$,
$$\frac{\hat{p}_n-p}{\sqrt{\hat{p}_n(1-\hat{p}_n)/n}} \buildrel D\over\rightarrow Z=N(0,1),$$ 
and then
$$1-\alpha\approx p\left(\hat{p}-z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}<p<
\hat{p}+z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}\right).$$

## Frequentist statistics <a name='freq'></a>
### Maximum likelihood estimation <a name='mle'></a>

### Expectation and maximization algorithm <a name='em'></a>

## Bayesian statistics <a name='bayesian'></a>

## Bootstrap <a name='bootstrap'></a>
**Bootstrap** is a data-based simulation method for statistical inference.
Bootstrap provides a **general** method that allows assigning measures of accuracy to sample estimates. 
- doesn't matter about the measure. These measures may be defined in terms of bias, variance, confidence intervals, prediction error or some other such measures.
- doesn't assume any assumption beyond sample.

**Basic Procedure of bootstrap**  
Given a random sample $X = (x_1, \dots, x_n)$,
- draw a sample $X^*_b = (x^*_1, \ldots, x^*_n)$ with replacement from the random sample $X$ (or we can say draw from the empirical measure $\hat{p}_n$) many times (say $B$ times).
- do statistical inference with theses samples $(X^*_1, \ldots, X^*_B)$.

### Estimate of standard error
We want to evaluate the quality of a statistics of a random sample $\hat{\theta} = T(x_1, \ldots, x_n)$ by its standard error.

### Estimate of bias

### Confidence intervals

### Prediction error

## Summary <a name='summary'></a>