# Week 3 stats exercises

## Problem 1.


(Computer exercise) Let’s assume you have **N = 100** observations of a $X_i \sim \text{Bernoulli}(p)$ random variable with **k = 23 successes**  
   (i.e. $X_i = 1$ in 23 out of the 100 repeats).

###  1.1. Find the **MLE estimator** $\hat{p}$ for $p$.

PDF for Bernoulli distribution is $f(x; p)=p^x(1-p)^{1-x}$ for $x \in \{0,1\}$

The unknown paramater is $p$. Then

$\mathcal{L}_n(\theta) = \prod_{i=1}^n f(X_i; p)$

$\mathcal{L}_n(p) = p^{x_1}(1-p)^{1-x_1} \cdot ... \cdot p^{x_n}(1-p)^{1-x_n}$

$\mathcal{L}_n(p) = p^{x_1 + ... + x_n} \cdot (1-p)^{n-(x_1 + ... + x_n)}$

Where $x_1 + ... + x_n = S$

$\mathcal{L}_n(p) = p^{S} \cdot (1-p)^{n-S}$

Taking the log:

$\mathcal{l}_n(p) = \log( p^{S} \cdot (1-p)^{n-S})$

$\mathcal{l}_n(p) = S\cdot \log(p) + (n-S)\log(1-p) $


Log derivate rule:

$\frac{d}{dx}\log(x) = \frac{1}{x}$


Take derivate of $\mathcal{l}_n(p)$

$\mathcal{l}_n(p) = S\cdot \log(p) + (n-S)\log(1-p) $

$\frac{d}{dp} \mathcal{l}_n(p) = S\frac{1}{p} - (n-S)\frac{1}{1-p} $

$\frac{d}{dp} \mathcal{l}_n(p) =\frac{S}{p} - \frac{n-S}{1-p} $

Set derivative to 0:

$\frac{S}{p} - \frac{n-S}{1-p} = 0$

$S(1-p) - p(n-S) = 0$

$S - Sp + Sp  -np = 0$

$S -np = 0$

$S  = np$

$p=\frac{S}{n} $

$p=\frac{23}{100} $


### 1.2. Compute the **95% confidence interval** for $\hat{p}$ using the normal approximation together with the Fisher information matrix (following the example given in the 6th lecture).

$f(x; p) = p^x(1-p)^{1-x}$

Taking the log of $f(x;p)$

$\log f(x; p) = x\log p + (1-x)\log(1-p)$

Taking the derivative:

$\frac{\partial \log f(x;p)}{\partial p} = \frac{x}{p} - \frac{1-x}{1-p}  $

$\frac{\partial^2 \log f(x;p)}{\partial p^2} = -\frac{x}{p^2} - \frac{1-x}{(1-p)^2}  $

Using the Fisher information Matrix:

$\mathcal{I}(p) = \mathbb{E}_p(-\frac{\partial^2 \log f(x;p)}{\partial p^2}) $

Insert formula solved above:

$\mathcal{I}(p) = -\mathbb{E}_p[-\frac{x}{p^2} - \frac{1-x}{(1-p)^2}] $

Use same insert logic as in lecture 6 material p.23:

$\mathcal{I}(p) = \frac{p}{p^2} + \frac{1-p}{(1-p)^2} $

$\mathcal{I}(p) = \frac{1}{p} + \frac{1}{1-p} $

$\mathcal{I}(p) = \frac{1-p}{p(1-p)} + \frac{p}{p(1-p)} $

$\mathcal{I}(p) = \frac{1}{p(1-p)}$

$\hat{se} = \frac{1}{\sqrt{\mathcal I_n (\hat p_n) }} $

$\hat{se} = \frac{1}{\sqrt{n\mathcal I (\hat p_n) }} $

$\hat{se} = \frac{1}{\sqrt{n \frac{1}{\hat p(1-\hat p)} }} $

$\hat{se} = \sqrt{\frac{\hat p(1- \hat p)}{n}} $


An approximation of 95% confidence interval is:

$\hat p_n \pm z_{1−\alpha/2​}\sqrt{\frac{\hat p(1- \hat p)}{n}} $

In [5]:
from math import sqrt
from scipy.stats import norm

n = 100
x = 23
p_n = x / n
z = norm.ppf(0.975)
se = sqrt(p_n * (1 - p_n) / n)
lower = p_n - z * se
upper = p_n + z * se

print(f"95% confidence interval: [{lower:.4f}, {upper:.4f}]")

95% confidence interval: [0.1475, 0.3125]


###   1.3. Compute the same **95% confidence interval** using **percentile** and **pivotal bootstrap**.

In [3]:
import numpy as np
n = 100
x = 23
sample = np.array([1]*x + [0]*(n - x))

p_hat = x / n

B = 10000
boot_means = []

for _ in range(B):
    new_sample = np.random.choice(sample, size=n, replace=True)
    boot_means.append(np.mean(new_sample))

boot_means = np.array(boot_means)

ci_percentile = np.percentile(boot_means, [2.5, 97.5])

errors = boot_means - p_hat
ci_pivotal = [p_hat - np.percentile(errors, 97.5),
              p_hat - np.percentile(errors, 2.5)]

print(f"95% Percentile bootstrap CI: [{ci_percentile[0]:.4f}, {ci_percentile[1]:.4f}]")
print(f"95% Pivotal bootstrap CI:    [{ci_pivotal[0]:.4f}, {ci_pivotal[1]:.4f}]")


95% Percentile bootstrap CI: [0.1500, 0.3100]
95% Pivotal bootstrap CI:    [0.1500, 0.3100]


## Problem 2.

Let $ X_1, X_2, X_3, X_4 \sim \text{Uniform}(a, b) $, where $a$ and $b$ are unknown parameters and $a < b$.

### 2.1. Find the MLE $ \hat{a} $ and $ \hat{b} $.

The PDF is:

$$
f(x; a, b) = 
\begin{cases}
\frac{1}{b - a} & \text{if } a \le x \le b \\
0 & \text{otherwise}
\end{cases}
$$
If $\hat b < X_i$ for some $i\in {1,2,3,4}$, $f(X_i,\hat a ,\hat b) = 0$ 

and hence $\mathcal{L_n}(\hat b) = 0$ for $\hat b < X_{(n)} = \max \{X_1, \dots, X_4\}$

Similarly, 

If $\hat a > X_i$ for some $i\in {1,2,3,4}$, $f(X_i,\hat a, \hat b) = 0$ 

and hence $\mathcal{L_n}(\hat a) = 0$ for $\hat a > X_{(1)} = \min \{X_1, \dots, X_4\}$.

Solve $\hat b$ first. For $\hat b \geq X{(n)}$:

$$
\mathcal{L}_n(b) = \prod_{i=1}^n f(x_i; a,  b) = (\frac{1}{b-a})^n
$$

Since this is strictly decreasing, $\hat b = X_{(n)} $

Solving $\hat a$ secondly. For $\hat a \leq X_{(1)}$:

$$
\mathcal{L}_n(a) = \prod_{i=1}^n f(x_i; a,  b) = (\frac{1}{b-a})^n
$$

For maximum likelihood value, $\hat a$ should be as large as possible, since $(\frac{1}{b-a})^n$ value grows as $a$ approaches $b$. 

But if $\hat a > X_{(1)}$, then $f(X_i,\hat a, \hat b) = 0$. Therefore the maximum likelihood value is when $\hat a = X_{(1)}$



### 2.2. 

Let $ \tau = \int x f(x) \, dx $, where $ f(x)$ is the PDF of the $Uniform(a, b)$ distribution. Find the MLE of $ \tau $.

The PDF is:

$$
f(x; a, b) = 
\begin{cases}
\frac{1}{b - a} & \text{if } a \le x \le b \\
0 & \text{otherwise}
\end{cases}
$$

$ \tau = \int x f(x) \, dx $

$ \tau = \int_a^b  x \cdot \frac{1}{b-a} \, dx $

$ \tau = \frac{1}{b-a} \int_a^b  x \, dx $

$ \tau = \frac{1}{b-a} \cdot (\frac{b^2}{2} - \frac{a^2}{2}) $

$ \tau = \frac{1}{b-a}\frac{b^2-a^2}{2} $

$ \tau = \frac{b^2-a^2}{2(b-a)} $

$ \tau = \frac{(b+a)(b-a)}{2(b-a)} $

$ \tau = \frac{b+a}{2} $

So maximum likelihood value for $\hat \tau$ is:

$$ 

\hat \tau = \frac{\hat b + \hat a}{2} \\

$$

$$
\hat \tau = \frac{X_{(n)} + X_{(1)} }{2} \\
$$

## Problem 3

Let $X_1, ..., X_n \sim Poisson(\lambda)$. Find the method of moments estimator for $\lambda$

Following solving logic from lecture 5 material page 11:

if $X \sim Poisson(\lambda) $ then $\alpha_1 = E[X] = \lambda$

and

$\hat \alpha_1 = \frac{1}{n} \sum_{i=1}^n X_i$

By equating these we get the estimator

$\hat \lambda_n = \frac{1}{n} \sum_{i=1}^n X_i$

## Problem 4

Let $X_1, ..., X_n \sim Poisson(\lambda)$.Find the the maximum likelihood estimator for $\lambda$

Likelihood function:

$$
\mathcal{L}_n(\lambda) = \prod_{i=1}^n f(X_i; \lambda)
$$

And the log function:

$$
\mathcal{l}_n(\lambda) = \log \mathcal{L}_n(\lambda) 
$$

Poisson PMF:

$$
f(k, \lambda) = \frac{\lambda^k e^{-\lambda}}{k!}
$$

So likelihood is:          

\begin{aligned}
\mathcal{L}_n(\lambda) &= \prod_{i=1}^n  \frac{\lambda^{X_i} e^{-\lambda}}{X_i!}\\
\mathcal{l}_n(\lambda) &= \log \mathcal{L}_n(\lambda) \\
\mathcal{l}_n(\lambda) &= \sum_{i=1}^n \log( \frac{\lambda^{X_i} e^{-\lambda}}{X_i!})\\
\mathcal{l}_n(\lambda) &= \sum_{i=1}^n \log(\lambda^{X_i}) + \log(e^{-\lambda}) -\log(X_i!)\\
\mathcal{l}_n(\lambda) &= \sum_{i=1}^n \log(\lambda)X_i -\lambda -\log(X_i!)\\
\mathcal{l}_n(\lambda) &= -n\lambda - n\log(X_i!) + \sum_{i=1}^n (\log(\lambda)X_i )\\
\end{aligned}

Taking the derivative
\begin{aligned}
\frac{\partial \mathcal{l}_n(\lambda)}{\partial \lambda} &= -n + \sum_{i=1}^n \frac{d}{d\lambda}\log(\lambda)X_i\\
\frac{\partial \mathcal{l}_n(\lambda)}{\partial \lambda} &= -n + \sum_{i=1}^n \frac{X_i}{\lambda}\\
\end{aligned}

Solving the equation for 0

\begin{aligned}
-n + \sum_{i=1}^n \frac{X_i}{\lambda} &= 0\\
\sum_{i=1}^n X_i &= n \lambda\\
\lambda &= \frac{1}{n} \sum_{i=1}^n X_i 
\end{aligned}

Which means MLE $\hat \lambda = \frac{1}{n} \sum_{i=1}^n X_i = \bar X = \lambda$


## Problem 5
week3_exercise5.txt data set comprises samples from the inverse-Gamma distribution which has the pdf

$$
f(x) = \frac{b^a}{\Gamma(a)} x^{-a-1} e^{-\frac{b}{x}},
$$

where $0 < a, b$, and $\Gamma(a)$ denotes the gamma-function.

Find the MLE for the parameters $a$ and $b$ using numerical optimization tools.

Solving the Likelihood log function:

Likelihood function:

$$
\mathcal{L}_n(a, b) = \prod_{i=1}^n f(X_i, a, b)
$$

And the log function:

\begin{aligned}
\mathcal{l}_n(a, b) &= \log \mathcal{L}_n(a, b) \\
\mathcal{l}_n(a, b) &=  \log (\prod_{i=1}^n f(X_i, a, b))\\
\mathcal{l}_n(a, b) &= \sum_{i=1}^n  \log  f(X_i, a, b)\\
\mathcal{l}_n(a, b) &= \sum_{i=1}^n \log \left( \frac{b^a}{\Gamma(a)} X_i^{-a-1} e^{-\frac{b}{X_i}}\right)\\
\mathcal{l}_n(a, b) &= \sum_{i=1}^n \left( \log  \frac{b^a}{\Gamma(a)} X_i^{-a-1} + \log(e^{-\frac{b}{X_i}})\right)\\
\mathcal{l}_n(a, b) &= \sum_{i=1}^n \left( \log  (b^a X_i^{-a-1}) - \log(\Gamma(a)) + \log(e^{-\frac{b}{X_i}})\right)\\
\mathcal{l}_n(a, b) &= \sum_{i=1}^n \left( \log  (b^a) + \log (X_i^{-a-1}) - \log(\Gamma(a)) + \log(e^{-\frac{b}{X_i}})\right)\\
\mathcal{l}_n(a, b) &= \sum_{i=1}^n \left( \log  (b^a) + \log (X_i^{-a-1}) - \log(\Gamma(a)) - \frac{b}{X_i}\right)\\
\mathcal{l}_n(a, b) &= \sum_{i=1}^n \left( a \log  (b) + \log X_i (-a-1) - \log(\Gamma(a)) - \frac{b}{X_i}\right)\\
\mathcal{l}_n(a, b) &= n a \log (b) - n \log(\Gamma(a))  + \sum_{i=1}^n \left(\log X_i (-a-1) - \frac{b}{X_i}\right)\\
- \mathcal{l}_n(a, b) &= -\left( n a \log (b) - n \log(\Gamma(a))  + \sum_{i=1}^n \left(\log X_i (-a-1) - \frac{b}{X_i}\right)\right)\\
\end{aligned}

In [4]:
import numpy as np
from scipy.special import gammaln 
from scipy.optimize import minimize
def get_data():

    array =[]
    with open("week3_exercise5.txt", "r") as f:
        for line in f:
            if "X" in line:
                continue
            array.append(float(line.strip()))
    return np.array(array)

def inverse_gamma_function(params, data):
    a, b = params
    if a <= 0 or b <= 0:
        return np.inf
    n = len(data)
    return -(n * a * np.log(b) - n*gammaln(a) + np.sum(np.log(data))*(-a-1) - np.sum(b/data))

data = get_data()

result = minimize(
    inverse_gamma_function,
    [0.5, 10],
    args=(data,),
    method="L-BFGS-B",
    bounds=[(0, None)]
)

print("Success:", result.success)
print("Estimated a and b:", result.x[0], result.x[1])

Success: True
Estimated a and b: 1.0048346829701857 2.0152455483516474
