# Exercise 1: E‑commerce Delivery Times

An online retailer promises 3‑day shipping. Occasionally, storms delay a few packages. Can you confidently continue advertising that you offer “3 day delivery” given occasional storm delays? To do this, you need to complete three tasks. 

1) run the below cell to simulate the data

2) calculate a log-likelihood estimate where we assume delivery times $~ N(\mu, \sigma^2)$ with known $\sigma = 0.5$ Find the $\mu$ that maximizes the likelihood of $\mu$ (note we will be minimizing the log of the likelihood.)

3) Calculate a 95% boostrap CI: Resample the 100 observations 2000 times, compute $\hat{\mu}$ for each bootstrap sample, and take the 2.5 % and 97.5 % percentiles.



In [None]:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(123)
# 100 on‑time deliveries ~ N(3 days, 0.5**2)
delivery_times = np.random.normal(3.0, 0.5, size=100)
# 3 extreme delays
delivery_times[:3] += np.array([5.0, 7.0, 4.0])

plt.hist(delivery_times)


### Helpful code:

In [None]:
from scipy import stats

# 1. Define neg‐log‐likelihood correctly 
def neg_log_lik(mu, data):
    return -np.sum(stats.norm.logpdf(data, loc=mu, scale=0.5))

# 2. Define your grid
mu_values = np.linspace(0.0, 6.0, 601)

# 3. Compute NLL on full data
nlls = [ neg_log_lik(mu, delivery_times) for mu in mu_values ]

# your task: find the best value from nlls
best_mu = mu_values[np.argmin(nlls)]
print("Best mu (MLE):", best_mu)

# 4. Bootstrap
boots = []
for _ in range(2000):
    sample = np.random.choice(delivery_times, size=100, replace=True)
    nlls_bs = [
        -np.sum(stats.norm.logpdf(sample, loc=mu, scale=0.5))
        for mu in mu_values
    ]
    boots.append(mu_values[np.argmin(nlls_bs)])

# your task: get the 2.5 and 97.5 %ile from boots
ci_lower = np.percentile(boots, 2.5)
ci_upper = np.percentile(boots, 97.5)
print("95% CI for mu from bootstrap:", (ci_lower, ci_upper))



## Questions

1) What is $\hat{\mu}$? How far is it from 3.0 days, and how did the 3 outliers pull it?

2) Does the 95 % CI include 3.0? What does that imply for your “3‑day promise”?

3) If you removed the 3 delays, how would μ̂ and the CI change?

1. $\hat{\mu} = 3.17$, which is slightly more than 3.0 days. The 3 outliers pulled the mean up slightly.
2. The 95% CI includes 3.0, which implies that the 3-day promise is relatively accurate.
3. If the three delays were removed, $\hat{\mu}$ would decrease and the size of the CI would decrease.

# Exercise 2: Call‑Center Inter‑arrival Times

A support center sees calls arriving with mean gap 5 min. Downtime causes a few very long gaps. Do you currently have enough call agents to keep the average hold‑times below 5 min? To do this you need to complete three tasks:

1) run the below cell to simulate the data

2) calculate a log-likelihood estimate where we assume call gaps $~ Exp(\lambda)$. Find the $\lambda$ that maximizes the likelihood of $\lambda$ (note we will be minimizing the log of the likelihood.)

3) Calculate a 95% boostrap CI: Resample the 200 observations 2000 times, compute $\hat{\lambda}$ for each bootstrap sample, and take the 2.5 % and 97.5 % percentiles.




In [None]:
# pseudocode to help 

# 1. MLE (PAUSE HERE WHAT IS THIS FUNCTION DOING BELOW)

# Reconstruct the data
np.random.seed(42)
gaps = np.random.exponential(scale=5.0, size=200)
gaps[:5] += np.array([30, 45, 60, 25, 50])
# plt.hist(gaps, bins=100)

# 1. MLE for λ
def neg_log_lik(λ, data):
    return -np.sum(stats.expon.logpdf(data, scale=1/λ))

# your task: create a grid of values to "check" (i.e., linspace like the last problem). Then, find the the best value from nlls
λ_values = np.linspace(0.001, 2 * gaps.mean(), 601)
nlls = [ neg_log_lik(λ, gaps) for λ in λ_values ]
best_λ = λ_values[np.argmin(nlls)]
print("Best λ:", best_λ)


from scipy import optimize

# 2. Bootstrap CI
boots = []
for _ in range(2000):
    samp = np.random.choice(gaps, size=200, replace=True)
    r = optimize.minimize(lambda l: neg_log_lik(l, samp), x0=0.2, bounds=[(1e-6, None)])
    boots.append(r.x[0])

# your task: get the 2.5 and 97.5 %ile from boots
ci_lower = np.percentile(boots, 2.5)
ci_upper = np.percentile(boots, 97.5)
print("95% CI for λ:", (ci_lower, ci_upper))

print("mean:", 1/best_λ)

## Questions

1) What is the estimated rate $\hat{\lambda}$? How does it compare to the true 0.2?

2) How robust is $\hat{\lambda}$ to the 5 downtimes (i.e., do the 5 downtimes shift the mean at all)?

3) Does the CI cover 0.2? What does this mean for staffing predictions?







1. The estimated $\hat{\lambda}$ is approximately 0.174, which is slightly less than 0.2.
2. Without the 5 downtimes, the mean is roughly 4.86. Including the downtimes, the mean is 5.74, so $\hat\lambda$ is not particularly robust to downtimes.
3. The CI covers 0.2, so staffing is generally enough except for rare downtimes.

# Exercise 3: Household Electricity Usage


Hourly household consumption (kWh) is log‑normally distributed (median ~ 20 kWh); rare equipment failures spike usage. Find the most value for the electricity usage and then construct a confidence interval for them. You've only budgeted to use 20 kWh per hour. 

In [None]:
import numpy as np
from scipy import stats
np.random.seed(7)
usage = np.random.lognormal(mean=np.log(20), sigma=0.3, size=150)
usage[:4] *= np.array([5, 4, 6, 3])

# plt.hist(usage)

def neg_log_lik(mu, data):
    return -np.sum(stats.lognorm.logpdf(data, s=0.3, scale=np.exp(mu)))

mu_values = np.linspace(2.0, 4.0, 401)
nlls = [ neg_log_lik(mu, usage) for mu in mu_values ]
best_mu = mu_values[np.argmin(nlls)]
print("Best mu:", best_mu)
print("Mean:", np.exp(best_mu + (0.3 ** 2) / 2))
print("Median:", np.exp(best_mu))

# 2. Bootstrap CI
boots = []
for _ in range(2000):
    samp = np.random.choice(usage, size=150, replace=True)
    nlls_bs = [ neg_log_lik(mu, samp) for mu in mu_values ]
    boots.append(mu_values[np.argmin(nlls_bs)])

# your task: get the 2.5 and 97.5 %ile from boots
ci_lower = np.percentile(boots, 2.5)
ci_upper = np.percentile(boots, 97.5)
print("95% CI for mu:", (ci_lower, ci_upper))
print("Lower mean:", np.exp(ci_lower + (0.3 ** 2) / 2))
print("Upper mean:", np.exp(ci_upper + (0.3 ** 2) / 2))


## Questions:

1) What is the estimated median usage?

2) Would you budget 20 kWh per hour per home confidently or should you budget more?

1. The median usage is approximately 20.49 kWh.
2. It is probably better to budget slightly more than 20 kWh per hour per home.