# Lecture 2

## The variance of an estimate (e.g. the case of the sample mean)

This example will demonstrate the formula for the Expected Mean Squared Error of a model on a test sample.

$$E[y_0 - \hat f(x_0)]^2 = \text{Var}(\hat f(x_0) + [\text{Bias}(\hat f(x_0))]^2 +\text{Var}(\epsilon)$$

The $MSE_{test}$ describes the prediction properties of a model if we could fit it to a very large number of independent samples, and then test it on a large number of samples of test data.

We will consider a simple model, the sample mean of draws from the standard normal distribution (i.e. $\mu$=0, $\sigma$=1)..

$$y_i = \mu+ \epsilon$$

We want to estimate the population mean $\mu$ with the sample mean. Our model is therefore

$$\hat{\mu} = \frac{1}{n} \sum_{i=1}^n y_i $$

The MSE is therefore:

$$E[y_0 - \hat \mu]^2 = \text{Var}(\hat \mu) + [\text{Bias}(\hat \mu)]^2 +\text{Var}(\epsilon)$$

For explanatory purposes, we will only draw $S<\infty$ samples of $n<\infty$ observations. Though this isn't ideal, we can't simulate infinite datasets, so it's the best we can do with a computer!

You can change $S$ and $n$ to examine their properties.

In [1]:
S <- 1000
n <- 1000

In [2]:
y <- replicate(S,rnorm(n))


We take the sample means of each column (sample). 


In [3]:
samplemeans <- colMeans(y)

Notice that each sample mean is different. This is due to the randomness in each of the training samples.

In [None]:
hist(y)

It is also "unbiased". If we had infinite observations per sample (try say, $n=1000000$), the average sample mean would approach the true mean. 

So $\text{Bias}[\hat \mu]=0$.

We can calculate the sample variance of all of our sample means. This will approach $\text{Var}(\hat\mu)$ as the number of samples gets large.

In [4]:
vsm <- var(samplemeans)
vsm

We can also then use it to predict 100 new observation, $x_0$.

In [5]:
y0 <- rnorm(S)

and calculate $MSE_{test}$

In [6]:
var(y0-samplemeans)

Notice that this is greater than $\text{Var}(\epsilon)=1$ (the irreducible prediciton error). As the samples sizes become large, this will verify the Expected Mean Squared Error equation.

$$E[y_0 - \hat \mu]^2 = \text{Var}(\hat \mu) +\text{Var}(\epsilon)$$