# Week 4 Day 3: Confidence Intervals (In-Class Lab)
**Course:** Google Advanced Data Analytics – The Power of Statistics  
**Module 4:** Confidence Intervals

In this lab, you will compute and visualize confidence intervals using analytical and bootstrap methods.


> ## Working with pandas Series (Quick Primer)

In this lab, you are working with **pandas Series**, which you can think of as a single column of data.

### Indexing a Series
You can access values in a Series using **square brackets**:

- By position:
```python
df['delivery_time'][0]
```
- By slicing:
```python
df['delivery_time'][0:5]
```

This works similarly to indexing lists in Python.

### Calling Methods on a Series
pandas Series come with many **built-in methods** that perform common calculations.

```python
df['delivery_time'].mean()
df['delivery_time'].std()
df['delivery_time'].min()
df['delivery_time'].max()
```

### A Helpful Reminder
You are **not expected to memorize** pandas or NumPy methods.

If you get stuck, **Google it**. More often than not, a method already exists. Learning how to find it is part of being a data analyst.

> **We'll dive deeper into this next week!**

In [None]:
# Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)

In [None]:
# Dataset
data = {
    "delivery_time": np.random.normal(loc=35, scale=5, size=100)
}
df = pd.DataFrame(data) # <--- Hey look a dataframe object!

In [None]:
# Practice: Exploring a pandas Series
delivery_series = df['delivery_time']

In [None]:
# Try indexing
delivery_series[0]

In [None]:
# Try slicing
delivery_series[0:5]

In [None]:
# Try calling a method
delivery_series.info()

In [None]:
# TODO 1: Inspect the dataset
# YOUR CODE HERE

## Sample Statistics
- The **sample mean** estimates the center of the population
- The **sample standard deviation** measures variability


In [None]:
# TODO 2: Compute sample mean and standard deviation
mean_delivery = None
std_delivery = None


In [None]:
# TODO 3: Store sample size as n, 
# Hint: how can we return the number of observations in a series. Maybe we can find the *shape* or *length* of the series
n = None


In [None]:
# ✅ Check + Visualization: Sample statistics
try:
    assert mean_delivery is not None
    assert std_delivery is not None
    assert std_delivery > 0
    print('✅ Sample statistics computed correctly.')

    plt.hist(df['delivery_time'], bins=30, edgecolor='black')
    plt.axvline(mean_delivery, color='red', linestyle='--', label='Sample Mean')
    plt.title('Distribution of Delivery Times')
    plt.legend()
    plt.show()
except:
    print('❌ Complete the sample statistics before visualizing.')


## Analytical Confidence Interval (t-based)

**Formula:**

$$
\\x \pm t^* \times \frac{s}{\sqrt{n}}
$$


In [None]:
confidence_level = 0.95

# TODO 4: Standard error
se = None

# TODO 5: t critical value
# There's a stats method for this! Use `stats.t.ppf()` to find the critical value",
t_critical = None

# TODO 6: Margin of error
margin_of_error = None

# TODO 7: Confidence interval bounds
ci_lower = None
ci_upper = None


In [None]:
# ✅ Check + Visualization: Analytical CI
try:
    assert ci_lower < ci_upper
    plt.hist(df['delivery_time'], bins=30)
    plt.axvline(ci_lower, color='red', linestyle='--')
    plt.axvline(ci_upper, color='red', linestyle='--')
    plt.axvline(mean_delivery, color='black')
    plt.title('Analytical Confidence Interval')
    plt.show()
except:
    print('❌ Complete the analytical CI before visualizing.')


## Bootstrap Confidence Interval
Bootstrapping approximates the sampling distribution by resampling **with replacement**.


In [None]:
n_bootstraps = 200
bootstrap_means = []

# TODO 8: Bootstrap resampling loop
for i in range(n_bootstraps): # check out the range() function https://www.w3schools.com/python/ref_func_range.asp
    # sample = ...
    # mean = ...
    # bootstrap_means.append(mean)


In [None]:
# TODO 9: Bootstrap percentile CI
ci_lower_boot = None
ci_upper_boot = None


In [None]:
# ✅ Check + Visualization: Bootstrap CI
try:
    assert len(bootstrap_means) == n_bootstraps
    plt.hist(bootstrap_means, bins=30)
    plt.axvline(ci_lower_boot, color='red', linestyle='--')
    plt.axvline(ci_upper_boot, color='red', linestyle='--')
    plt.axvline(mean_delivery, color='black')
    plt.title('Bootstrap Distribution of the Mean')
    plt.show()
except:
    print('❌ Complete the bootstrap steps before visualizing.')


## Reflection (Answer briefly: 1–2 sentences each)

1. What does a 95% confidence interval actually mean?
Explain what the confidence level refers to—and what it does not mean.

2. Why is the confidence interval centered around the sample mean instead of the population mean?
What does this tell us about what we do and do not know?

3. How does sample size affect the width of a confidence interval?
Explain why this happens in terms of sampling variability.

4. In this lab, the analytical and bootstrap confidence intervals were similar but not identical. Why might they differ?

5. When might a bootstrap confidence interval be preferred over an analytical (t-based) confidence interval?
Give one practical reason.

6. Looking at the bootstrap distribution of the mean, why does it appear more concentrated than the original data distribution?

7. If you repeated this entire process with a new random sample from the same population, what would you expect to change—and what would stay roughly the same?

8. Does a 95% confidence interval mean there is a 95% chance the true mean is inside the interval? Why or why not?

9. Can two different samples from the same population produce confidence intervals that do not overlap? What would that imply?