# Demystifying Confidence Intervals & Hypothesis Testing

In this assignment we will use a sampling shoe sizes of a population to explore two key statistical tools: **confidence intervals** and **hypothesis testing**.

Let's start off with the assumption that we measured everybody's shoe sizes and found that they follow a normal distribution with a mean of 10 and standard deviation of 2:
$$
X \sim N(\mu = 10, \sigma = 2)
$$

## Part 1: Confidence Intervals

### Simulating the Confidence Interval

**Task:**  
Using a simulation, we will generate many shoe sizes and then find the range that contains 95% of the values, centered around the mean. This interval tells us that 95% of people in the population have shoe sizes within that range. (Hint: Compute the 2.5th and 97.5th percentiles.)

In [12]:
# pip install numpy matplotlib scipy
import numpy as np
from scipy.stats import norm, t, ttest_1samp

In [13]:
# Number of simulated shoe sizes
N = 10000

# Generate shoe sizes one at a time and store in a list

simulated_shoe_sizes = []

for i in range(N):
    shoe_size = np.random.normal(10, 2)  # Draw one shoe size from N(10,2^2)
    simulated_shoe_sizes.append(shoe_size)

# Sort the list to compute the percentile indices manually

simulated_shoe_sizes.sort()

# Find the indices for the 2.5th and 97.5th percentiles

lower_index = int(0.025 * N)
upper_index = int(0.975 * N)


# Compute and print the upper and lower bounds

lower_bound_sim = simulated_shoe_sizes[lower_index]
upper_bound_sim = simulated_shoe_sizes[upper_index]

print("Simulation-based 95% Population Interval:")
print(f"Lower bound: {lower_bound_sim:.2f}")
print(f"Upper bound: {upper_bound_sim:.2f}")

Simulation-based 95% Population Interval:
Lower bound: 6.01
Upper bound: 13.90


## Theoretical Distribution Calculation

For a normal distribution $X \sim N(\mu,\sigma^2)$, the theoretical 95% population interval is computed using the inverse cumulative distribution function (quantile function). Specifically, we find:
$$
\text{Lower bound} = \mu + \sigma \cdot z_{0.025} \quad \text{and} \quad \text{Upper bound} = \mu + \sigma \cdot z_{0.975}
$$
where $z_{0.025}$ and $z_{0.975}$ are the 2.5th and 97.5th percentiles of the standard normal distribution.

For our parameters, $\mu = 10$ and $\sigma = 2$, the values are computed as:
$$
\text{Lower bound} = 10 + 2 \cdot z_{0.025}, \quad \text{Upper bound} = 10 + 2 \cdot z_{0.975}
$$

Let’s calculate these values.

Z-score: how far a point is from mean (in terms of standard deviations)

ppf(p) tells you at what z-score does a certain percentage of data falls below this point.

In [14]:
# Theoretical calculation using the standard normal quantiles - use norm.ppf
z_lower = norm.ppf(0.025)
z_upper = norm.ppf(0.975)

mu, sigma = 10, 2

# Compute and print the upper and lower bounds based on the formula above
theoretical_lower = mu + sigma * z_lower
theoretical_upper = mu + sigma * z_upper

print("Theoretical 95% Population Interval:")
print(f"Lower bound: {theoretical_lower:.2f}")
print(f"Upper bound: {theoretical_upper:.2f}")

Theoretical 95% Population Interval:
Lower bound: 6.08
Upper bound: 13.92


## 2. Hypothesis Testing – Is the Original Population Mean Flawed?

**Scenario:**  
For years, we've measured shoe sizes and concluded that the overall population has a mean of 10 (with a standard deviation of 2). However, new data collected from a particular neighborhood might suggest that the original measurement is flawed.

**Hypotheses:**

- **Null Hypothesis ($H_0$)**: The original calculation is correct and the population mean is $\mu = 10$.  
- **Alternative Hypothesis ($H_a$)**: The original calculation is flawed, meaning the true mean is not 10 ($\mu \neq 10$).

We'll use a one-sample t-test to evaluate whether the neighborhood sample provides evidence that the overall mean differs from 10.

In [15]:
# Generate a neighborhood sample using np.random.normal.
# Here we assume that the neighborhood might have a slight shift (e.g., a mean of 10.5) compared to the overall measurement.
sample_size = 30
np.random.seed(0)
neighborhood_sample = np.random.normal(10.5, 2, sample_size)

# Calculate the neighborhood sample's statistics: mean and standard deviation.
sample_mean = np.mean(neighborhood_sample)
sample_std = np.std(neighborhood_sample, ddof=1)

# Print the sample mean and standard deviation for the neighborhood
print("Neighborhood Sample Statistics:")
print(f"Sample Mean: {sample_mean:.2f}")
print(f"Sample Standard Deviation: {sample_std:.2f}")

Neighborhood Sample Statistics:
Sample Mean: 11.39
Sample Standard Deviation: 2.20


### Simulation-Based t-Test

**Task:**  
We now simulate many samples under the null hypothesis (i.e., assuming the true mean is 10 and the standard deviation is 2) to see how extreme our observed t statistic is.

**Procedure:**
1. Generate many samples (each of size 30) from $N(10,2^2)$.
2. For each simulated sample, compute the t statistic:
   $$
   t = \frac{\bar{x} - 10}{s/\sqrt{n}}
   $$
3. Count the fraction of simulated samples with an absolute t statistic at least as large as our observed value. This fraction is the empirical p‑value.


In [16]:
numSimulations = 10000
extremeCount = 0

for _ in range(numSimulations):
    # Generate a sample from the null hypothesis: N(10,2)

    sim_sample = np.random.normal(10, 2, sample_size)
    sim_mean = np.mean(sim_sample)
    sim_std = np.std(sim_sample, ddof=1)
    sim_t = (sim_mean - 10) / (sim_std / np.sqrt(sample_size))

    # Count as extreme if the absolute simulated t statistic is at least as extreme as our observed t statistic

    if abs(sim_t) >= abs((sample_mean - 10) / (sample_std / np.sqrt(sample_size))):
        extremeCount += 1

p_value_simulation = extremeCount / numSimulations
print("Simulation-Based t-Test Result:")
print(f"Simulated p-value: {p_value_simulation:.3f}")

Simulation-Based t-Test Result:
Simulated p-value: 0.002


### Analytical One-Sample t-Test

**Procedure:**  
We now compute the t statistic analytically using:
$$
t = \frac{\bar{x} - 10}{s/\sqrt{n}}
$$
with:
- $\bar{x}$ as the sample mean,
- $s$ as the sample standard deviation,
- $n$ as the sample size,
- and 10 as the hypothesized population mean.

Based on the t-distribution with $n-1$ degrees of freedom, we calculate the two-sided p‑value.

Hint: cdf gives area under curve to the left of t_statistic (we want area to the right)


In [17]:
# Define the null hypothesis mean
mu0 = 10

# Compute the observed t statistic using the neighborhood sample
t_statistic = (sample_mean - mu0) / (sample_std / np.sqrt(sample_size))
df = sample_size - 1

# Calculate the two-sided p-value from the t-distribution
p_value_analytical = 2 * (1 - t.cdf(abs(t_statistic), df))

print("Analytical t-Test Results:")
print(f"t Statistic: {t_statistic:.3f}")
print(f"p-value: {p_value_analytical:.3f}")

Analytical t-Test Results:
t Statistic: 3.449
p-value: 0.002


### Comprehension Question: Should we accept or reject the null hypothesis? What is the significance of our p-value?

# 🚨Your answer here🚨

We got a p-value of 0.002 in the analytical t-Test. Since this value is less than 0.05, it means that the observed resutls are statistically significant, which means we should reject the null hypothesis. The results suggest that the actual populatin mean size is not 10.