# Maximum Likelihood Estimator for Gaussian

Suppose we have a dataset of observations $\boldsymbol{\mathsf{x}} = (x_1,\ldots,x_N^T)$, representing $N$ observations of the scalar variable $x$. Note that we are using the typeface $\boldsymbol{\mathsf{x}}$ to distinguish this from a single observation of the vector-valued variable $(x_1,\ldots,x_D)^T$, which we denote by $\mathbf{x}$. 

We shall suppose that the observations are drawn independently from a Gaussian distribution whose mean $\mu$ and variance $\sigma^2$ are unknown, and we would like to determine these parameters from the dataset.

Data points that are draw independently from the samedistribution are said to be *independent and identically distributed* (iid). We have seen that the joint probability of two independent events is given by the product of the marginal probabilities for each event separately. Because our dataset $\boldsymbol{\mathsf{x}}$ is i.i.d., we can therefore write the probability of the data set, given $\mu$ and $\sigma^2$, in the form

$$
p(\boldsymbol{\mathsf{x}}|\mu,\sigma^2) = \prod_{n=1}^N\mathcal{N}(x_n|\mu, \sigma^2)
$$

When viewed as a function of $\mu$ and $\sigma^2$, this is the **likelihood function** for the Gaussian.

For the moment, we shall determine values for the unknown parameters $\mu$ and $\sigma^2$ in the Gaussian by maximizing the likelihood function (or minimizing the log-likelihood function).

The log-likelihood function can be written in the form

$$
\mathrm{ln}p(\boldsymbol{\mathsf{x}}|\mu,\sigma^2) = -\frac{1}{2\sigma^2}\sum_{n=1}^N(x_n-\mu)^2 - \frac{N}{2}\mathrm{ln}\sigma^2 - \frac{N}{2}\mathrm{ln}(2\pi)
$$

Taking the derivative with respect to $\mu$ and set it to zero, we have

$$\sum_{n=1}^N(x_n - \mu_{ML}) = 0 \implies \mu_{ML} = \frac{1}{N}\sum_{i=1}^Nx_n,$$ where $\mu_{ML}$ is the maximum likelihood solution, which is the *sample mean*.

Similarly, the maximum likelihood solution for the variance can be found as 

\begin{aligned}
\frac{1}{\sigma^2}\sum_{n=1}^N(x_n-\mu)^2 -N = 0 \\
\sigma_{ML}^2 = \frac{1}{N}\sum_{n=1}^N(x_n-\mu_{ML})^2
\end{aligned}

### MLE approach systematically underestimates the variance of the distribution:

First, we note that the maximum likelihood solutions $\mu_{ML}$ and $\sigma^2_{ML}$ are functions of the dataset $x_1,\ldots,x_n$. Consider the expectations of these quantities with respect to the dataset values, which themselves come from a Gaussian distribution with parameters $\mu$ and $\sigma^2$.

So 

$$
\mathbb{E}[\mu_{ML}] = \frac{1}{N}\sum_{i=1}^N\mathbb{E}[x_n] = \mu
$$

## Exercise 1.12 
Using the results (1.49) and (1.50), show that

$$
\mathbb{E}[x_n x_m] = \mu^2 + I_{nm}\sigma^2
$$
(1.130)

where $x_n$ and $x_m$ denote data points sampled from a Gaussian distribution with mean $\mu$ and variance $\sigma^2$, and $I_{nm}$ satisfies $I_{nm} = 1$ if $n = m$ and $I_{nm} = 0$ otherwise.
Hence prove the results (1.57) and (1.58).

(a) If $n \ne m$, then $x_n$ and $x_m$ are iid and so $\mathbb{E}[x_n x_m] = \mathbb{E}[x_n]\mathbb{E}[x_m] = \mu^2$

(b) If $n = m$, then (1.50) shows that $\mathbb{E}[x_n^2]  = \mu^2 + \sigma^2$. Combining these together gives the required result.

(c) Prove that 

$$
\mathbb{E}[\sigma_{ML}^2] = \left(\frac{N-1}{N}\right)\sigma^2
$$

\begin{aligned}
\mathbb{E}[\sigma_{ML}^2] & = \mathbb{E}\left[\frac{1}{N}\sum_{n=1}^N(x_n - \frac{1}{N}\sum_{m=1}^Nx_m)^2\right]\\
& = \frac{1}{N}\sum_{n=1}^N\mathbb{E}\left[x_n^2 - \frac{2}{N}x_n\sum_{m=1}^Nx_m + \frac{1}{N^2}\sum_{m=1}^N\sum_{l=1}^Nx_m x_l\right] \\
& = \left\{\mu^2 + \sigma^2 -2 \left(\mu^2+\frac{1}{N}\sigma^2\right)+\mu^2+\frac{1}{N}\sigma^2\right\}\\
& = \left(\frac{N-1}{N}\right)\sigma^2
 \end{aligned}

### Interpretation
What this means is that on average, the maximum likelihood estimate will obtain the correct mean but will underestimate the true variance by a factor $(N-1)/N$. 

From the above it follows the following estimate for the variance parameter is unbiased:

$$
\tilde{\sigma}^2 = \frac{N}{N-1}\sigma_{ML}^2 = \frac{1}{N-1}\sum_{n=1}^N(x_n - \mu_{ML})^2.
$$