In [57]:
import pandas as pd

boston = pd.read_csv("../../data/Boston.csv").rename({"Unnamed: 0": "Id"}, axis=1).set_index("Id")
u = boston["medv"].mean()
u

np.float64(22.532806324110677)

In [58]:
import numpy as np
se_medv = boston['medv'].std(ddof=1) / np.sqrt(len(boston))
se_medv

np.float64(0.40886114749753505)

With an estimated value of 22.53 and a std error of 0.4, this estimate can be trusted. Our mean value would probably be between 22 and 23.

In [59]:
def boot_fn(data: pd.DataFrame, idx, mean_med = 1):
    df = data.loc[idx]
    if mean_med == 1:
        return df["medv"].mean()
    elif mean_med == 0:
        return df["medv"].median()
    else:
        return np.percentile(df["medv"], 10)

In [60]:
B = 1000
n = boston.shape[0]
rng = np.random.default_rng(0)

means = np.zeros(B)

for i in range(B):
    idx = rng.choice(boston.index, size=n, replace=True)
    means[i] = boot_fn(boston, idx)

se = means.std(ddof=0)

print("Standard Error (bootstrap):" + str(se))

Standard Error (bootstrap):0.4125347675099613


The bootstrap returned an estimate slightly higher then the original, by almost 0.01.

In [61]:
mean_medv = boston['medv'].mean()

ci_lower = mean_medv - 2 * se
ci_upper = mean_medv + 2 * se

(ci_lower, ci_upper)

(np.float64(21.707736789090752), np.float64(23.3578758591306))

In [62]:
ci_formula_lower = mean_medv - 2 * se_medv
ci_formula_upper = mean_medv + 2 * se_medv

(ci_formula_lower, ci_formula_upper)


(np.float64(21.715084029115605), np.float64(23.35052861910575))

The results are very close, confirming that the bootstrap provides a reliable estimate. Both intervals suggest that the population mean of median housing value lies comfortably between 21.7 and 23.4. The small difference (≈0.01) is expected due to the randomness in resampling.

In [63]:
mu_med = boston["medv"].median()
mu_med

np.float64(21.2)

In [64]:
B = 1000
n = boston.shape[0]
rng = np.random.default_rng(0)

meds = np.zeros(B)

for i in range(B):
    idx = rng.choice(boston.index, size=n, replace=True)
    meds[i] = boot_fn(boston, idx, 0)

se = meds.std(ddof=0)

print("Median standard error (bootstrap):" + str(se))

Median standard error (bootstrap):0.3694462207141924


The result (0.37) suggests that the median housing value is relatively stable across samples, but still subject to some fluctuation. This reflects the robustness of the median to outliers,it's less sensitive than the mean but still varies when the sample changes. Overall, this gives us a reliable measure of uncertainty around the median value of medv in the Boston housing data set.

In [65]:
mu_0_1 = np.percentile(boston["medv"], 10)
mu_0_1

np.float64(12.75)

In [66]:
B = 1000
n = boston.shape[0]
rng = np.random.default_rng(0)

perc = np.zeros(B)

for i in range(B):
    idx = rng.choice(boston.index, size=n, replace=True)
    perc[i] = boot_fn(boston, idx, 2)

se = perc.std(ddof=0)

print("tenth percentile standard error (bootstrap):" + str(se))

tenth percentile standard error (bootstrap):0.5034541091301172


The small standard error suggests that the 10th percentile is estimated with fairly high precision despite being from the tail of the distribution, which can often have higher variability.