# TP Estimation (Classical)

### author: Anastasios Giovanidis, 2018-2019

This is the TP related to frequentist estimation. We will need to import the following libraries.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import preprocessing
import math

## Exercise 1 (Sample Mean, Standard Error)

**Multimedia file-size:** We investigate the distribution of sizes in [bytes] of multimedia files found online. We have collected a random sample of $N$ files. Suppose that the sample is i.i.d. drawn from:

- a Uniform distribution $[0,\theta]$. (for $E[X]=10^9$, choose $\theta=2 \cdot10^{9}$)

- a log-Normal $logN(\mu,\sigma)$ distribution. (for $E[X]=10^9$, choose $\mu=5\ln(10)$, $\sigma=\sqrt{8\ln(10)}$)

- an Exponential($\lambda$) distribution. (for $E[X]=10^9$, choose $\lambda=10^{-9}$, or $\beta=\lambda^{-1}$ for Python)

(A) For each case, calculate the Sample Mean, and Sample Standard Deviation. Use sample size $N=10^3$ and $N=10^8$.

What do you observe?

(B) For all distributions show that the CLT holds for the Sample Mean.

**Solution A**

**Solution B**

## Exercise 2 (Bias)

Consider a network of sensors that measures the temperature in an area. We collect $N$ such measurements ($N=10^3$, or  $10^7$):

- Each measurement $X_i$ is drawn i.i.d from a distribution $F_X(x)$ that we do not know in advance, but let us suppose it is Exponential with mean $20^oC$.
- Each measurement is corrupted by random noise due to measurement imprecision. This is $W_i$ and is drawn i.i.d from the Normal distribution $N(0,\sigma_W^2)$. Choose $\sigma_W = \left\{1,or\ 50\right\}$.

The collected set of measurements is $(Y_1,\ldots,Y_N)$, where:

$Y_i = X_i+ W_i$


We would like to use the available data set to calculate the real mean and variance, by using the Sample Mean, and Sample Variance, both of which are unbiased for $(X_1,\ldots,X_N)$.

Q1: What is the true value of Standard Deviation for $X$ that we want to estimate?

Q2: Is the Sample Mean of $(Y_1,\ldots,Y_N)$ unbiased?

Q3: Is the Sample Variance of $(Y_1,\ldots,Y_N)$ unbiased? 

Explain your answers.

**Solution**

## Exercise 3 (Bias and MSE)

Consider again a random sample of size $N$. The entries correspond to multimedia file-sizes. We assume that entries are drawn i.i.d. from a Uniform distribution $[0,\theta]$. (choose $\theta=2 \cdot10^{9}$).

We define the estimator of $\theta$ (upper bound) as

$\hat{\Theta}_n = \max\left\{X_1,\ldots,X_n\right\}$.

- Find the "average" bias of $\hat{\Theta}_n$ (i.e. use $M=10000$ iterations to derive the average difference $\hat{\Theta}_n-\theta$)

- Find the "average" $MSE(\hat{\Theta}_n)$

- Is $\hat{\Theta}_n$ a consistent estimator? (use $N=100$, $N=10^3$ and $N=10^4$)

**Solution**

## Exercise 4 (MLE)

Consider a random sample $(X_1,\ldots,X_N)$ of size $N$ (choose $N=100$). Assume that the samples are drawn i.i.d from:

- a Binomial distribution $(m,\theta)$. (for $E[X]=10$, choose $m=50, \theta=0.2$)

- a Normal $N(\mu,\sigma)$ distribution. (for $E[X]=10$, choose $\mu=10$, $\sigma=2$)

- an Exponential($\lambda$) distribution. (for $E[X]=10$, choose $\lambda=10$, or $\beta=\lambda^{-1}$ for Python)

Q1: Use the Maximum Likelihood Estimators from the classroom, to estimate:

- $\theta$ for the Binomial.

- $(\mu,\sigma^2)$ for the Normal.

- $\lambda$ for the Exponential.

Q2: Are these estimators assymptotically consistent? (choose $N=10$, $N=10^3$, $N=10^5$)