In [2]:
import numpy as np

# Point Estimators

Spring Break provides a nice breaking point in our course. Our plan for the rest of the semester is to apply what we have learned so far to answering some question. Specifically we would like to develop techniques for making inferences and conclusions from statistical data. 

To that end we will have a series of examples to illustrate what we want to do:

## How much to bid

A construction company is putting a bid in on a new building project. Comparable projects have cost the company 535, 540, 545, 546, and 600 dollars per square foot. To compute the bid to submit we need to come up with an estimate on the cost per square foot for the project, the decision is to find an estimate for this from the mean cost per square foot. 



In [4]:
data = [535, 540, 545, 546, 600]
Ybar = np.mean(data)
Ybar

553.2

The obvious estimate is of course the sample mean from the data we have:  553.20 dollars. 

The first concern is:  What do we expect the error of this estimate to be?

### Unbiased Estimator

Let $\hat{\theta}$ be an estimator for a statistics $\theta$. Of course a good estimator will have the property that the expected value of it is the parameter we are trying to find:  $$ E(\hat{\theta}) = \theta$$  and for example $\bar{Y}$ as an estimator for $\mu$ has that property. 

This is called an unbiased estimator.

#### Example

The 553.20 above is our unbiased estimate of the mean cost for similar jobs. 

### Biased Estimators

Of course there are examples (like above) where maybe what we want is a biased estimator. 

**Example** the construction company may want to protect against loses over getting a contract, and so inflating the cost estimate so that its expected outcome is higher than the mean might be worthwhile to hedge against an expensive job.

The *Bias* of an estimator is defined to be:  $$ B(\hat{\theta}) = E(\hat{\theta}) - \theta$$

A simple bias on our esimtator above might be achieved by adding a factor to $\bar{Y}$ such as $S/\sqrt{n}$.

In [6]:
# Sample standard deviation is

S = np.sqrt(sum( [ (data[i] - Ybar)**2 for i in range(5)])/4)
S

26.527344382730814

In [38]:
# Positive Biased Estimate is

theta_hat = Ybar + S/np.sqrt(5)
theta_hat

565.0633890604668

Obviously the expense of introducing a bias (at least in this example) is going to be an increase in the error. The point is that we are adding error we consider acceptable (over estimates) and subtracting errors we consider less acceptable (under estimates).

### Mean Square Error

Of course our other concern is what we expect the error between $\hat{\theta}$ and $\theta$ to be. The *Mean Square Error* is defined (in the obvious way) by:

$$ \mbox{MSE}(\hat{\theta}) = E (\hat{\theta} - \theta)^2 ) $$

The mean square error of an estimator is a function of both its variance (how much we expect the estimator to vary if we repeate the experiment) and its bias (how much we expect the estimators expected value to differ from the statistic). 

$$\mbox{MSE}(\hat{\theta}) = V(\hat{\theta}) + B(\hat{\theta})^2 $$

Interestingly, put this way, all of the dependence on $\theta$ itself is contained in the bias.

### Example - $\bar{Y}$

So for the unbiased estimator of the mean we have:

$$ \mbox{MSE}(\bar{Y}) = V(\bar{Y}) = \sigma^2 / 5 $$ 

If we know $\sigma$. We do not know $\sigma$ so our estimate would replace this with $S$. That fact that $S$ is not $\sigma$ will show up in our computation of the probabilities for error bounds and intervals on Thursday (i.e. we will use a t-distribution).

In [41]:
MSE = S**2 / 5
MSE

140.74

#### Example - $\bar{Y} + S/\sqrt{n}$

Here we compute the mean square error by adding in the bias $S/2$ term:


In [39]:
MSE_bias = S**2 / 5 + S**2 / 5
MSE_bias

281.48

In [42]:
np.sqrt(MSE), np.sqrt(MSE_bias)

(11.863389060466659, 16.77736570502056)

## Point Estimators for $\sigma$

So we have seen our first example that $\bar{Y}$ is an unbiased point estimator for $\mu$. If the population from which we are sampling is normal then it is in fact the best (lowest MSE) unbiased estimator for $\mu$. Note that for example if the population is uniformly distributed in some interval there are better unbiased estimators of $\mu$.

Now let's consider estimated the population variance (or standard deviation). 

Someone who was just joining our class today might suspect that:

$$ S'^2 = \frac{1}{n} \sum (Y_i - \bar{Y})^2 $$

would be a good estimator. However it turns out that this estimator is biased.

$$ \sum (Y_i - \bar{Y})^2 = \sum Y_i^2 - n \bar{Y}^2 $$

So that

$$ E( S'^2 ) = \frac{1}{n} \left[ \sum E(Y_i^2) - n E(\bar{Y}^2) \right] $$

Note that $E(Y_i^2)$ is the same for all i, $\sigma^2 + \mu^2$ and that $E(\bar{Y}^2) = \sigma^2/n + \mu^2$ giving us:

$$ E( S'^2 )=\frac{1}{n} \left[ n (\sigma^2 + \mu^2) - n (\sigma^2/n + \mu^2)\right] = \frac{(n-1)}{n} \sigma^2$$

Which for a fixed n is strictly smaller than $\sigma^2$. Meaning $S'^2$ is a left biased estimator of $\sigma^2$. 

This then explains where the $\frac{1}{n-1}$ comes from in our sample variance formula - it gives us an unbiased estimator of $\sigma^2$ for all $n$.

Note that as $n$ becomes large this issue becomes smaller - the size of the bias does go to zero.

### Example - Proportions

Colorado is trying to understand how many people have been vaccinated for COVID-19 so far. A random phone poll of 120 Coloradans finds 33 who have received at least one shot of vaccine for COVID-19. 

Our estimate for the proportion of Coloradans who have received at least one shot of vaccine is then:

In [44]:
phat = 33/120
phat

0.275

This is an unbiased estimate for the true value of $p$. The Mean Square Error for proportion estimates is

$$ \mbox{MSE}(\hat{p}) = \frac{p q}{n} $$

and so our estimate of that error here is:

In [45]:
MSE = phat*(1-phat)/120
np.sqrt(MSE)

0.04076099033798533

We have seen that the area under the normal distribution within two standard deviations of the mean contains about 95% of the density of the distribution:

In [46]:
from scipy.stats import norm

In [47]:
norm.cdf(2) - norm.cdf(-2)

0.9544997361036416

We also recall that the binomial distribution can be modeled as a normal distribution for large $n$. So a 95.45% error bound on our estimate of $\hat{p} = 0.275$ would be given by:  

In [48]:
phat - np.sqrt(MSE)*2, phat + np.sqrt(MSE)*2

(0.19347801932402936, 0.3565219806759707)

In [49]:
np.sqrt(MSE)*2

0.08152198067597066

Or we might phrase this as the proportion is estimated to be $0.9275 \pm 0.0815 $ (with 95% confidence).

### Example - How big of a sample

Note the change if we double the sample size:


In [50]:
MSE = phat*(1-phat)/240
np.sqrt(MSE)

0.02882237267586877

In [51]:
phat - np.sqrt(MSE)*2, phat + np.sqrt(MSE)*2

(0.2173552546482625, 0.33264474535173755)

In [52]:
np.sqrt(MSE)*2

0.05764474535173754

Of course its more subtle because changing the sample will change our $\hat{p}$.

## Example - Difference of Two Proportions

This result will rely on the unsurprising fact that if $Y_1$ and $Y_2$ are normal distributions with means $\mu_1$ and $\mu_2$, and variances $\sigma_1^2$ and $\sigma_2^$ then $U = Y_1 - Y_2$ is normal with mean 

$$ E(Y_1 - Y_2) = \mu_1 - \mu_2 $$

and variance 

$$ V(Y_1 - Y_2 ) = \sigma_1^2 + \sigma_2^2 $$

Together with again using the normal estimate for a binomial.

### Vaccine Rates

Colorado has vaccinated 13.3% of its 5.8 million population. Arkansas has vaccinated 18.9% of its 0.7 million population based on surveys of 500 people in each state. What do we expect the difference in proportions to be and with what error range for 95% confidence?


In [53]:
# difference phat_1 - phat_2 

phat1 = 0.133
phat2 = 0.189

phatdiff = phat1 - phat2
phatdiff

-0.055999999999999994

In [54]:
# MSE is from adding the MSE of each:

MSEdiff = phat1*(1-phat1)/(500) + phat2*(1-phat2)/(500)
MSEdiff

0.0005371800000000001

In [55]:
2 * np.sqrt(MSEdiff)

0.04635428782755701

So the error of the difference between the state's proportions is on the order of 4.6% to either side of -5.6 percent. 

## Other Point Estimators

Generally point estimators exist for any parameter relevant to a distribution. A favorite of mine is the following example because it is so different from the normal based ones above:  And because it produces something similar to the sample variance formula showing again why the work we are doing here matters.

Let $Y_1, Y_2, \dots, Y_n$ be a sample from a uniform distribution on the interval $(0, \theta)$. What is an estimate for $\theta$?

### Obvious approach

We found the distribution of $max(Y_1, \dots, Y_n) = Y_{(n)} $ previously. It is given by the density function:

$$ g_{(n)}(y) = n ( F_Y(y) )^{n-1} f(y) = \left\{ \begin{matrix} n \left(\frac{y}{\theta} \right)^{n-1} \frac{1}{\theta} & 0 \leq y \leq \theta \\ 0 & \mbox{otherwise} \end{matrix} \right. $$

and of course $Y_{(n)}$ is a great estimate for $\theta$. 

#### Is it biased

Let's compute $E(Y_{(n)})$?  Well before we compute it. What do you think?  Will it be biased?

In [33]:
import sympy as sp

y = sp.Symbol('y')
n = 5
# for reasons fixing n is better
theta = sp.Symbol('theta')

g = n* y**(n-1) /theta**n
g

5*y**4/theta**5

In [35]:
sp.integrate(y* g, (y, 0, theta))

5*theta/6

And indeed we see it is biased. Play with the code and convince yourslef that 

$$ \hat{\theta} = \frac{n+1}{n} Y_{(n)} $$ 

is an unbiased estimator of $\theta$. Note that this is essentially a correction taking into account that we have a limited sample and that none of our sample can be larger than $\theta$.

#### Variance of $\frac{n+1}{n} Y_{(n)} $

To compute the MSE of our unbiased estimator we need to compute the variance:


In [37]:
# V(Y_{(n)} is given by 

thetahat = (n+1)*y/n
sp.integrate( (thetahat - theta)**2 * g, (y, 0, theta) )

theta**2/35

We will be able to show:  $$ V( \frac{n+1}{n} Y_{(n)} )= \frac{\theta^2}{n (n+2) } $$

### Another Approach

Another idea is to recall that the Expected Value of $\bar{Y}$ is $\theta/2$ and so our other estimator could be:

$$ \hat{\theta}_2 = 2 \bar{Y}$$

#### Is it biased

We can quickly compute $$ E(2 \bar{Y}) = \theta $$

#### MSE 

It's mean square error comes from $$ V(2 \bar{Y}) = 4 V(\bar{Y}) = \frac{4}{n} V(Y_1) = \frac{\theta^2}{3 n} $$

## Comparing two unbiased estimators

When given two unbiased estimators, the ratio of their MSE gives a measure of what we call *relative efficiency* with the idea that an esimator with less error, biases being irrelevant, is producing better results:

$$ \mbox{eff}( \frac{n+1}{n} Y_{(n)}, 2 \bar{Y} ) = \frac{ V( 2\bar{Y}) }{ V( \frac{n+1}{n} Y_{(n)} ) } = \frac{n (n+2)}{3n} = \frac{n+2}{3} $$

If the efficiency is bigger than 1 then the first estimator is considered better; if the efficiency is less then 1 than the second would be considered better.

In this case we see that for $n> 1$ the first estimator is more effiicient. Does this make sense?

### Effeciency is a bad word here

Of course there are other problems with effeciency. A new probably confronting statisticians and data scientists is when they have to much data and computing an estimator from the data becomes inefficient. In the example above one can compute both in O(n) operations so they are approximately equivalent computationally. But sorting in particular is computationally intensive and so you can imagine examples using the other $Y_{(k)}$ that would become computationally complex.

# Discussion

Finding new estimators is a fun business. In most cases, for known distributions and the standard parameters, we have known best unbiased estimators available (google and wikipedia can help you find them). Where the work goes is, looking back at our initial problem, when we have a specific problem that is necessitating an biased estimate of one kind or another. To find these means we are giving up on best in the MSE sense and are now trying to minimize something else. 

You can write a PhD thesis about solving one of these problems!