# TP  Confidence Intervals & Hypothesis Tests

### author: Anastasios Giovanidis, 2018-2019

In [1]:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
import random

## Exercise 1 (interval)

We wish to measure a quantity $\theta$, but there is a random error in each measurement (noise). 

Then, measurement $i$ is

$X_i = \theta+W_i$,

$W_i$ being the error in the $i$-th measurement. All $W_i$s are i.i.d.

We take $n$ measurements $(X_1,\ldots, X_n)$ and report the average of the measurements $\overline{X}$ as the estimated value of $\theta$. The $W_i$s are drawn from $Normal(0,4^2)$ with **known** variance, whereas the **unknown** parameter $\theta=1$.

   a) Given a sample-set of size $n=10$, provide the confidence interval for $\theta$, with confidence $\alpha=90\%$. 
      
   b) Draw $T=2,000$ times, new sets of size $n=10$ each, find new intervals for each $t$, and mark with $+1$ if the unknown parameter $\theta=1$ falls inside the new confidence interval calculated, otherwise $0$. What is the percentage that it falls inside the estimated interval?
   
   c) After having written the code, repeat (a)-(b) for unknown variance, using the sample standard deviation and approximate confidence intervals. What do you observe? Why?

**Answers**

## Exercise 2 (hypothesis test)

We will study in more detail the Neyman-Pearson Test, which leads to the Likelihood Ratio Test (LRT) we saw during the course. It can be shown that this test has the following property:

**Theory:** The LRT minimises Type II error, under the requirement that Type I error is bounded by: $\alpha\leq2^{-\lambda n}$ for a given $\lambda>0$.

**Application in Wireless Networks:** We can use a hypothesis test to determine anomalies in the normal operation of a cellular network. Consider an LTE network which serves mobile users, and let us focus on some specific period every Monday. Specifically, assume that the network consists of just two base stations ($S_1$ and $S_2$), on neighbouring cells.

During this period, and every Monday, each of these Base Stations have a charge $Y_{i}$, $i\in\left\{1,2\right\}$ which is a random variable, drawn from a Normal distribution of mean $\rho$ and standard deviation $\sigma$, both known. This knowledge comes from systematic measurements that the stations constantly perform and send to some control center. 

If an anomaly occurs on station $S_2$, the second Base Station becomes deactivated. As a result, all users that were served by this station will migrate to the neighbouring $S_1$, and the new charge of the remaining station will become $2\rho$ in mean value. This information will gradually be sent through load measurements to the control center as well.

Consider the hypothesis:

- $H_0:$ the system of two stations is operating normally, VS

- $H_1:$ there is an anomaly in base station $S_2$.

**Questions**

(A) Find (analytically) the criterion that guarantees a false alarm of $1\%$.

(B) The designer wishes to achieve a false alarm of $1\%$ within $10$ measurements.
Draw $T=20,000$ sets of size $N=10$ from the $H_0$, and verify with simulations that indeed the false alarm is $1\%$.

(C) Suppose that at the beginning of the measurements, all works well, but at time $M$ the station breaks down. We do not know the instant that the anomaly begins. How many additional measurements after $M$ are necessary, to detect the anomaly? Use simulations to find out! (Again do $T=20,000$ simulations to answer on average.)

Use values for $M=10,50,100, 200$ and evaluate the change in average delay.

Values: $\rho = 50$ [Mbps], and $\sigma = 5$ [Mbps].

**Answers**