# Maximum Likelihood Estimation

## Mean of Random Variables
Let's think about a new random variable $Y$:
\begin{align}
    Y & = \frac{1}{n}\sum_i X_i
\end{align}
where $X_i$s are independent identically distributed (iid) random variables with mean $\mu$ and std $\sigma$.

Its expections are:
\begin{align}
    \mathbb{E}[Y] & = \frac{1}{n}\sum_i \mathbb{E}[X_i] = \mu\\
    \mathbb{V}[Y] & = \mathbb{V}\bigg[\frac{1}{n}\sum_i X_i\bigg] \\
                  & = \frac{1}{n^2}\mathbb{V}\bigg[\sum_i X_i\bigg] \\
                  & = \frac{1}{n^2}\sum_i\mathbb{V}\big[X_i\big] \quad\mbox{independence, uncorrelated}\\
                  & = \frac{1}{n^2}\sum_i \sigma^2 \\
                  & = \frac{1}{n} \sigma^2 \\
    Std(Y) &= \frac{\sigma}{\sqrt{n}}   \quad \mbox{called 'standard error'}
\end{align}
- Implication:
    - $Y$ is a mean estimator. It is a function that tries to estimate the mean of the distribution of r.v. $X$.
    - measure as many $X$'s as possible, and compute its mean $Y$.
    - the expectation of $Y$ approaches to the true mean $\mu$ in the limit.
    - the standard deviation of $Y$ approaches to 0.
    - the std of $Y$ is actually the error of the mean estimator $Y$. So it is called the **standard error** (SE).

## Observation Model
Let's consider the case we want to estimate a constant value such as temperature.

$$
    y_i = m + e_i, \quad \mbox{for} \quad i = 1, ..., n
$$

- Assumption:
    - Each $y_i$ is independent of the other observations.
    - $e_i$'s are iid.
    - The random noise $e_i$ is modeled to be $\mathcal{N}(0,\sigma)$.
    - $     \mathbb{E}(Y_i) = m + \mathbb{E}(E_i) $ , $\mathbb{V}(Y_i) = \mathbb{V}(E_i)$
    - That is,

\begin{align}
    p(y_i) &= p(y_i | \sigma) = \mathcal{N}(m, \sigma) \\
    y_i &\sim \mathcal{N}(m, \sigma) \\
    p(y_i | \sigma) &= \mathcal{N}(y_i | m, \sigma)
\end{align}

- Distribution of $p(y_1, ..., y_n|m, \sigma) = p(y_1|m,\sigma) \cdots p(y_n|m,\sigma)$

\begin{align}
    likelihood(m, \sigma) 
        &= p(y_1|m,\sigma) \cdots p(y_n|m,\sigma) \\
        & = \prod_{i=1}^n p(y_i|m,\sigma) = \prod_i \mathcal{N}(y_i|m, \sigma) \\
    log\_likelihood(m, \sigma) &= \log \prod_i \mathcal{N}(y_i|m, \sigma) \\
        & = \sum_i \log\big\{\mathcal{N}(y_i|m, \sigma) \big\} \\
        & = \sum_i \bigg[
                        \log\big\{\frac{1}{\sigma\sqrt{2}}\big\}
                        -
                        \frac{1}{2} \big( \frac{y_i - m}{\sigma} \big)^2
                    \bigg]
\end{align}

- The likelihood is a function of $m$ and $\sigma$. 
- It is not a probability distribution.

## ML and LS estimates
- ML estimate:
    - $\hat m$ and $\hat\sigma$ that maximizes the likelihood are called the ML estimate.

$$ 
        \hat m, \hat\sigma = \mathrm{argmax}_{m,\sigma} \ \ likelihood(m, \sigma)
$$

- Least-squared (error) estimate:

$$
    \hat m = \mathrm{argmax}_{m} \bigg\{ -\frac{1}{2} \big( \frac{y_i - m}{\sigma} \big)^2 \bigg\}
$$

- Normally, $\sigma$ is not considered or assumed to be known in LS estimation