# Quiz 3
---
Initialize R parameters for graphing and some style choices:

## Question 1:
---
*Suppose that you have 5 data points (1, 2 ,5, 3, 2 ,1) where the density is linearly decreasing on the interval , given by:*
$$f(x)=\begin{cases} \frac{2(\theta-x)}{\theta^2} & \text{if } u\in [0,\theta]
,\\ 0 & \text{otherwise.} \end{cases} $$

*What is the standard error for the maximum-likelihood estimator for: $\theta$?  (To 3 decimal places, calculated using the score!)*

### Answer:
We already know that the max-likelihood estimator is $\hat{\theta}=6.318$. So we need to calculate our estimate for the $\mathbb{E}s(\theta)^2$ (though this is only with six obervations!).

The log likelihood for a data draw X is given by: $$l(\theta)=\log(2)+\log(\theta-X)-2\log(\theta)$$
so the derivative of this with respect to $\theta$ is $$s(\theta)= \frac{1}{\theta-X}-\frac{2}{\theta}.$$

We need to try to get an approximation for the expected value of $s(\theta)^2$ with respect to $X$, where we don't know the true value $\theta$. Our standard way of doing this is to use the sample to do this via:
$$\tfrac{1}{N}\sum_{i=1}^N s(\hat{\theta};x_i)^2 $$

In [1]:
theta.hat <- 6.31806653
score.1a.x <- function(x.i) (1/(theta.hat-x.i)-2/theta.hat)
score.vector <-  sapply(c(1,2,5,3,2,1),score.1a.x)
Inf.matrix.score <- sum(score.vector**2)/6

We can then get a standard error for this via: $$ \sqrt{n}(\hat{\theta}-\theta)\rightarrow^D \mathcal{N}(0,\hat{I}^{-1}) $$
So our standard error is:

In [2]:
se.theta <- sqrt((1/Inf.matrix.score)/6)
c(se=round(se.theta,3))

## Question 2
---
*Why would you be wary about using your standard error to construct a 95 percent confidence interval for the above?*

### Answer:
Using this standard error to construct the confidence interval we would get:

In [3]:
round(c(lower.conf=theta.hat-qnorm(0.975)*se.theta,upper.conf=theta.hat+qnorm(0.975)*se.theta),3)

But from our previous analysis of this, we know that it's impossible for $\theta$ to be less than the maximum value... as we are 100% certain $\theta\geq 5$. So we'd probably want to recalculate the confidence interval to be $[5, \hat{\theta}+c]$ where it's likely conservative to report $[5,\hat{\theta}+1.96\cdot\hat{\sigma}],$ though I'd probably here go out of my way to footnote this assumption in the analysis notes. Alterntively I might run a simulation here to get a sense for what was going on.

## Question 3
---

*The geometric distribution is a discrete distribution that counts the number of independent draws until a specified event with probability  $p$ occurs (flipping a coin until a head occurs, rolling a die until you get a six, etc.). The probability mass function for a geometric random variable $X$ is given by:*
$$ \Pr\left\{X=x\right\}=(1-p)^{x-1}\cdot p\text{    for }x=1,2,\ldots $$

*Suppose you have 5 draws from a geometric distribution with parameter $p$  given by  (9, 16, 12, 3, 10). What is the standard error for the maximum likelihood estimate of the parameter $p$ ?  (Use the second derivative of the log-likelihood to estimate $I_p$  here, and report to 3 dp.)*

### Answer:
So again, we'll use our estimator from the last quiz of $\hat{p}=\tfrac{1}{10}$.

In [4]:
pHat <- 1/10

The log-likelihood of each draw $X$ is given by: $$\log(p)+(X-1)\log(1-p) ,$$ so the derivative of this is:
$$\frac{\partial l(p)}{\partial p}=s(p)=\frac{1}{p}-\frac{X-1}{1-p}.$$
We want the second derivative though which we get by taking the derivative of the score to get:
$$\frac{\partial^2 l(p)}{\partial p^2}=-\frac{1}{p^2}-\frac{X-1}{(1-p)^2}.$$

Assessing this at $\hat{p}=\tfrac{1}{10}$ gives us a second-derivative for the log-likelihood of $ -\tfrac{100}{81}(X+80)$, where we know that the information matrix (scalar here) is given by:
$$I_p=-\mathbb{E}\left(\frac{\partial^2 \log L(\theta)}{\partial p^2}\right).$$
To get an approximation of the Information matrix value here we can average the value of the second derivative across our data draws for $X$ (where the average in our data is 10) to get:
$$\hat{I}_p=\tfrac{100}{81}(\bar{x}+80)=\tfrac{100}{81}\cdot(90)=\tfrac{1000}{9}.$$

We can then get a standard error for this via: $$ \sqrt{n}(\hat{p}-p)\rightarrow^D \mathcal{N}(0,\hat{I}^{-1}) $$
So our standard error is:

In [5]:
se.p <- sqrt( (9/1000)/5   )
c(se=round(se.p,3))

## Questions 4 and 5
---

Here it wants us to assemble a 90 percent confidence interval. So we need to get the critical value with 5 percent on either side from the normal distribution:

In [6]:
crit.value<- qnorm(0.95) # 5 percent on top, two sided though, so 90% total!
crit.value

Then we simply add and subtract this many multiples of the standard error:

In [7]:
round(c( lower.bound=pHat-crit.value*se.p,upper.bound=pHat+crit.value*se.p),3)