# Exercise 8: Bootstrap
## 1
Let $X_1, \ldots, X_n$ be independent and identically distributed random variables having unknown mean $\mu$. For given constants $a<b$, we are interested in estimating $p=P\left\{a<\sum_{i=1}^n X_i / n-\mu<b\right\}$.

(a) Explain how we can use the bootstrap approach to estimate $p$.

**Solution:**

We can use the bootstrap method by bootstrap resampling:

- Compute the sample mean $\bar{X}=\frac{1}{n} \sum_{i=1}^n X_i$.
- Define the statistic of interest $T=\frac{1}{n} \sum_{i=1}^n X_i-\mu$.
- Generate $B$ bootstrap samples by resampling from the original data with replacement.
- For each bootstrap sample, compute the bootstrap sample mean $\bar{X}^{(b)}$.
- Calculate the bootstrap statistic $T^{(b)}=\bar{X}^{(b)}-\bar{X}$.
- Estimate $p$ as the proportion of bootstrap replicates for which $a<T^{(b)}<b$.

We can estimate $p$ with the following formula:
$$
\hat{p}=\frac{1}{B} \sum_{b=1}^B I\left(a<T^{(b)}<b\right),
$$
where $I(\cdot)$ is the indicator function.

(b) Estimate $p$ if $n=10$ and the values of the $X_i$ are 56, 101, 78, 67, 93, 87, 64, 72, 80 and 69. Take $a=-5$, $b=5$.

In [67]:
import numpy as np

# Parameters
n = 10
a, b = -5, 5
X = [56, 101, 78, 67, 93, 87, 64, 72, 80, 69]

# Bootstrap settings
B = 1000
T = np.zeros(B)

x_mean = np.mean(X)

for i in range(B):
    b_sample = np.random.choice(X, size=n, replace=True)
    x_mean_b = np.mean(b_sample)
    T[i] = x_mean_b - x_mean

# Estimate the probability p
p_hat = np.mean((a < T) & (T < b))

print(f"Bootstrap estimate of p: {p_hat}")


Bootstrap estimate of p: 0.764


## 2.
If $n=15$ and the data are 
$$5,4,9,6,21,17,11,20,7,10,21,15,13,16,8$$
approximate (by a simulation) the bootstrap estimate of $Var(S^2)$.

In [84]:
import numpy as np

# Parameters
n = 15
X = [5, 4, 9, 21, 17, 11, 20, 7, 10, 21, 15, 13, 16, 8]

# Bootstrap settings
B = 1000
vars = np.zeros(B)

for i in range(B):
    b_sample = np.random.choice(X, size=n, replace=True)
    vars[i] = np.var(b_sample, ddof=1)

# Estimate the variance of the sample variance
var_S2 = np.var(vars, ddof=1)

print(f"Estimate of Var(S2): {var_S2}")


Estimate of Var(S2): 59.82977529692959


## 3.
Write a subroutine that takes as input a “data” vector of observed values, and which outputs the median as well as the
bootstrap estimate of the variance of the median, based on $r = 100$ bootstrap replicates. Simulate $N = 200$ Pareto distributed random variates with $\beta = 1$ and $k = 1.05$.

(a) Compute the mean and the median (of the sample)

In [149]:
# Parameters
N = 200
beta = 1
k = 1.05

# Generate uniform random variables
U = np.random.uniform(0, 1, N)

# Inverse transform sampling for Pareto distribution
X = beta * (1 / (1 - U))**(1 / k)

print(f"The mean of the sample is {np.mean(X)}")
print(f"The median of the sample is {np.median(X)}")


The mean of the sample is 3.8515042698602087
The median of the sample is 1.7903619746827986


(b) Make the bootstrap estimate of the variance of the sample mean.

In [211]:
# Bootstrap settings
r = 100
means = np.zeros(r)

for i in range(r):
    b_sample = np.random.choice(X, size=N, replace=True)
    means[i] = np.mean(b_sample)

var_means = np.var(means, ddof=1)
print(f"The boostrap estimate of the variance of the sample mean is {var_means}")

The boostrap estimate of the variance of the sample mean is 0.23131915599656994


(c) Make the bootstrap estimate of the variance of the sample median.

In [213]:
# Bootstrap settings
r = 100
medians = np.zeros(r)

for i in range(r):
    b_sample = np.random.choice(X, size=N, replace=True)
    medians[i] = np.mean(b_sample)
    
print(f"The boostrap estimate of the variance of the sample median is {np.var(medians, ddof=1)}")

The boostrap estimate of the variance of the sample median is 0.245394732347845


(d) Compare the precision of the estimated median with the precision of the estimated mean.

**Solution:**

We see that the precision in the estimated mean is higher since the variance of the sample mean is lower than the variance of the sample median.