# Confidence Intervals: Quantifying Uncertainty üìä

## Introduction: From Point to Interval

In the previous notebook, we learned **point estimation**: Using sample data to estimate a parameter with a single number.

**Example**: Sample mean = 5.15 tons/hectare

But this doesn't tell the whole story! **How certain are we about this estimate?**

### The Problem with Point Estimates:

- Different samples give different estimates (sampling variability)
- A single number doesn't convey uncertainty
- We need a **range of plausible values**

### The Solution: Confidence Intervals! üéØ

Instead of saying: **"ŒºÃÇ = 5.15"**

We say: **"ŒºÃÇ = 5.15, 95% CI: [5.02, 5.28]"**

This communicates:
- Our best estimate (5.15)
- Our uncertainty (¬±0.13)
- Our confidence level (95%)

### ML Connection ü§ñ

**Always report ML model performance with confidence intervals!**

Bad: "Model accuracy = 85%"

Good: "Model accuracy = 85% ¬± 2% (95% CI: [83%, 87%])"

The second tells you how reliable the performance estimate is!

---

## Learning Objectives üéØ

By the end of this notebook, you will:

1. ‚úÖ Understand the **confidence interval concept** ‚≠ê‚≠ê
2. ‚úÖ Calculate CI for mean (œÉ known and unknown)
3. ‚úÖ Calculate CI for proportions
4. ‚úÖ **Interpret confidence levels correctly** (most critical!)
5. ‚úÖ Understand factors affecting CI width
6. ‚úÖ Apply to ML: **Model performance with uncertainty** ‚≠ê‚≠ê

‚≠ê‚≠ê = Most critical concept

---

Let's quantify uncertainty! üöÄ

In [None]:
# üì¶ Setup: Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

# Set style for beautiful plots
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11

# Set random seed for reproducibility
np.random.seed(42)

print("‚úì Setup complete!")
print("üìä Ready to learn confidence intervals")

---

## 1. Confidence Interval Intuition üí°

### What is a Confidence Interval?

**General Form**:

$$
\text{CI} = \text{Point Estimate} \pm \text{Margin of Error}
$$

$$
\text{CI} = \hat{\theta} \pm (\text{Critical Value}) \times SE(\hat{\theta})
$$

### For Sample Mean:

$$
\bar{x} \pm z^* \times \frac{\sigma}{\sqrt{n}}
$$

Where:
- xÃÑ = sample mean (point estimate)
- z* = critical value (e.g., 1.96 for 95% confidence)
- œÉ/‚àön = standard error (uncertainty in estimate)

### CRITICAL INTERPRETATION ‚ö†Ô∏è

**WRONG**: "There is a 95% probability that Œº is in [a, b]"

**CORRECT**: "If we repeat this procedure many times, 95% of the intervals will contain the true Œº"

The parameter Œº is **fixed** (not random). The interval is **random** (changes with different samples).

### Confidence Level:

- **90% CI**: 90% of such intervals will capture true parameter
- **95% CI**: 95% of such intervals will capture true parameter (most common)
- **99% CI**: 99% of such intervals will capture true parameter

Higher confidence ‚Üí Wider interval (trade-off!)

---

In [None]:
# üé≤ Simulation: Demonstrate CI interpretation
# Take 100 samples, construct 95% CI for each, see how many capture true Œº

# True population
true_mu = 5.2
true_sigma = 0.8
population = np.random.normal(true_mu, true_sigma, 100000)

# Simulation parameters
n_intervals = 100
sample_size = 50
confidence_level = 0.95
z_star = stats.norm.ppf((1 + confidence_level) / 2)  # 1.96 for 95%

# Store CI information
ci_lower = []
ci_upper = []
captures_mu = []

for _ in range(n_intervals):
    # Take a sample
    sample = np.random.choice(population, size=sample_size, replace=False)
    
    # Calculate 95% CI (assuming œÉ known for simplicity)
    x_bar = sample.mean()
    se = true_sigma / np.sqrt(sample_size)
    margin = z_star * se
    
    lower = x_bar - margin
    upper = x_bar + margin
    
    ci_lower.append(lower)
    ci_upper.append(upper)
    captures_mu.append(lower <= true_mu <= upper)

ci_lower = np.array(ci_lower)
ci_upper = np.array(ci_upper)
captures_mu = np.array(captures_mu)

capture_rate = captures_mu.mean()

print("üéØ Confidence Interval Interpretation Simulation:")
print("=" * 60)
print(f"True population mean: Œº = {true_mu}")
print(f"Confidence level: {confidence_level*100}%")
print(f"Number of intervals constructed: {n_intervals}")
print(f"\nRESULTS:")
print(f"  Intervals that captured true Œº: {captures_mu.sum()}/{n_intervals}")
print(f"  Capture rate: {capture_rate*100:.1f}%")
print(f"  Expected: ~{confidence_level*100:.0f}%")
print(f"\nüí° Interpretation:")
print(f"   '{confidence_level*100:.0f}% confidence' means:")
print(f"   If we repeat this procedure many times, {confidence_level*100:.0f}% of intervals")
print(f"   will contain the true parameter Œº")
print(f"\n‚ö†Ô∏è It does NOT mean 'Œº has {confidence_level*100:.0f}% probability of being in this interval'!")

In [None]:
# üìä Visualization 1: The famous CI interpretation plot
# 100 horizontal lines (CIs), green if captures Œº, red if doesn't

plt.figure(figsize=(12, 10))

# Plot each CI as a horizontal line
for i in range(n_intervals):
    color = 'green' if captures_mu[i] else 'red'
    alpha = 0.6 if captures_mu[i] else 0.9
    linewidth = 1 if captures_mu[i] else 2
    
    # Horizontal line from lower to upper
    plt.plot([ci_lower[i], ci_upper[i]], [i, i], color=color, 
             alpha=alpha, linewidth=linewidth)
    
    # Mark the point estimate
    point_est = (ci_lower[i] + ci_upper[i]) / 2
    plt.scatter([point_est], [i], color=color, s=20, alpha=alpha, zorder=3)

# Mark the true parameter
plt.axvline(true_mu, color='blue', linestyle='--', linewidth=3, 
            label=f'True Œº = {true_mu}', zorder=2)

plt.xlabel('Wheat Yield (tons/hectare)', fontsize=12)
plt.ylabel('Sample Number', fontsize=12)
plt.title(f'Confidence Interval Interpretation: {n_intervals} Different 95% CIs üéØ', 
          fontsize=14, fontweight='bold')

# Custom legend
from matplotlib.lines import Line2D
legend_elements = [
    Line2D([0], [0], color='blue', linestyle='--', linewidth=3, label=f'True Œº = {true_mu}'),
    Line2D([0], [0], color='green', linewidth=2, label=f'Captures Œº ({captures_mu.sum()})'),
    Line2D([0], [0], color='red', linewidth=2, label=f'Misses Œº ({n_intervals - captures_mu.sum()})')
]
plt.legend(handles=legend_elements, fontsize=11, loc='upper right')

plt.grid(True, alpha=0.3, axis='x')
plt.xlim(4.5, 6.0)

# Add text box
textstr = f'Capture Rate: {capture_rate*100:.1f}%\nExpected: {confidence_level*100:.0f}%'
props = dict(boxstyle='round', facecolor='wheat', alpha=0.8)
plt.text(0.02, 0.98, textstr, transform=plt.gca().transAxes, fontsize=12,
         verticalalignment='top', bbox=props)

plt.tight_layout()
plt.show()

print("\nüí° What This Plot Shows:")
print("   - Each horizontal line is a 95% CI from a different sample")
print("   - GREEN lines: CI contains true Œº ‚úì")
print("   - RED lines: CI misses true Œº ‚úó")
print(f"   - About {confidence_level*100:.0f}% are green (as expected!)")
print("\nüí° Correct Interpretation:")
print("   'The procedure captures Œº 95% of the time'")
print("   NOT 'Œº is in this specific interval with 95% probability'")

---

## 2. CI for Mean (œÉ Known) üìè

### When to Use:

Rarely in practice (we usually don't know œÉ), but useful for understanding concepts.

### Formula:

$$
\bar{x} \pm z^* \times \frac{\sigma}{\sqrt{n}}
$$

### Critical Values (z*):

| Confidence Level | z* |
|-----------------|----|
| 90% | 1.645 |
| 95% | 1.960 |
| 99% | 2.576 |

### Trade-off:

- **Higher confidence** ‚Üí Larger z* ‚Üí **Wider interval** (less precise)
- **Lower confidence** ‚Üí Smaller z* ‚Üí **Narrower interval** (more precise)

You can't have high confidence AND high precision simultaneously!

---

In [None]:
# üåæ Calculate CIs with different confidence levels

# Sample data
np.random.seed(42)
sample_size = 50
sample = np.random.choice(population, size=sample_size, replace=False)
x_bar = sample.mean()
sigma = true_sigma  # Assume known
se = sigma / np.sqrt(sample_size)

# Calculate CIs for different confidence levels
confidence_levels = [0.90, 0.95, 0.99]
cis = {}

for conf in confidence_levels:
    z_star = stats.norm.ppf((1 + conf) / 2)
    margin = z_star * se
    lower = x_bar - margin
    upper = x_bar + margin
    cis[conf] = (lower, upper, margin, z_star)

print("üìè Confidence Intervals with Different Confidence Levels:")
print("=" * 60)
print(f"Sample size: n = {sample_size}")
print(f"Sample mean: xÃÑ = {x_bar:.3f} tons/hectare")
print(f"Population œÉ (assumed known): {sigma}")
print(f"Standard Error: SE = œÉ/‚àön = {se:.4f}")
print(f"\n{'Confidence':<12} {'z*':<8} {'Margin':<12} {'CI':<30}")
print("-" * 60)

for conf in confidence_levels:
    lower, upper, margin, z_star = cis[conf]
    print(f"{conf*100:.0f}%{'':<9} {z_star:<8.3f} {margin:<12.4f} [{lower:.3f}, {upper:.3f}]")

print("\nüí° Notice the trade-off:")
print("   - Higher confidence ‚Üí Wider interval (less precise)")
print("   - Lower confidence ‚Üí Narrower interval (more precise)")
print("   - You must choose based on your needs!")

In [None]:
# üìä Visualization 2: Confidence vs Precision trade-off

plt.figure(figsize=(12, 6))

# Plot CIs as horizontal lines
colors = ['orange', 'green', 'blue']
y_positions = [3, 2, 1]

for idx, conf in enumerate(confidence_levels):
    lower, upper, margin, z_star = cis[conf]
    y = y_positions[idx]
    
    # Draw CI
    plt.plot([lower, upper], [y, y], color=colors[idx], linewidth=4, 
             label=f'{conf*100:.0f}% CI: [{lower:.2f}, {upper:.2f}]')
    
    # Mark endpoints
    plt.scatter([lower, upper], [y, y], color=colors[idx], s=100, 
                edgecolors='black', linewidths=1.5, zorder=3)
    
    # Mark point estimate
    plt.scatter([x_bar], [y], color='red', s=150, marker='D', 
                edgecolors='black', linewidths=1.5, zorder=4)

# Mark true parameter
plt.axvline(true_mu, color='black', linestyle='--', linewidth=2, alpha=0.7,
            label=f'True Œº = {true_mu}')

plt.xlabel('Wheat Yield (tons/hectare)', fontsize=12)
plt.yticks(y_positions, [f'{c*100:.0f}% Confidence' for c in confidence_levels])
plt.title('Confidence vs Precision Trade-off üìä', fontsize=14, fontweight='bold')
plt.legend(fontsize=10, loc='upper right')
plt.grid(True, alpha=0.3, axis='x')
plt.xlim(4.7, 5.7)

# Annotations
plt.annotate('', xy=(cis[0.99][0], 0.5), xytext=(cis[0.99][1], 0.5),
             arrowprops=dict(arrowstyle='<->', lw=2, color='blue'))
plt.text((cis[0.99][0] + cis[0.99][1])/2, 0.3, 'Wider\n(less precise)', 
         ha='center', fontsize=10, color='blue', fontweight='bold')

plt.annotate('', xy=(cis[0.90][0], 3.5), xytext=(cis[0.90][1], 3.5),
             arrowprops=dict(arrowstyle='<->', lw=2, color='orange'))
plt.text((cis[0.90][0] + cis[0.90][1])/2, 3.7, 'Narrower\n(more precise)', 
         ha='center', fontsize=10, color='orange', fontweight='bold')

plt.tight_layout()
plt.show()

print("\nüí° Key Insight:")
print("   - All three CIs contain the true Œº (in this case)")
print("   - But 99% CI is much wider than 90% CI")
print("   - Choose confidence level based on consequences of being wrong")

---

## 3. CI for Mean (œÉ Unknown) ‚≠ê‚≠ê

### The Realistic Case

In practice, we **don't know œÉ**! We must estimate it from the sample: s

### Problem:

Using s instead of œÉ introduces additional uncertainty

### Solution: t-Distribution!

$$
\bar{x} \pm t^*_{df} \times \frac{s}{\sqrt{n}}
$$

Where:
- t* = critical value from t-distribution
- df = n - 1 (degrees of freedom)
- s = sample standard deviation

### t vs z Distribution:

- **t-distribution**: Heavier tails, accounts for uncertainty in estimating œÉ
- As n ‚Üí ‚àû, t-distribution ‚Üí normal distribution
- For n ‚â• 30, t and z are very similar

### When to Use:

- ‚úÖ **Use t**: When œÉ is unknown (almost always!)
- Use z: Only when œÉ is truly known (rare)

---

In [None]:
# üåæ Calculate CI using t-distribution (œÉ unknown)

# Sample data (same as before)
x_bar = sample.mean()
s = sample.std(ddof=1)  # Sample SD (unbiased)
n = len(sample)
se = s / np.sqrt(n)

# 95% CI using t-distribution
confidence = 0.95
df = n - 1
t_star = stats.t.ppf((1 + confidence) / 2, df)
z_star = stats.norm.ppf((1 + confidence) / 2)

margin_t = t_star * se
margin_z = z_star * se

ci_t = (x_bar - margin_t, x_bar + margin_t)
ci_z = (x_bar - margin_z, x_bar + margin_z)

print("üìä Confidence Interval with Unknown œÉ (Using t-distribution):")
print("=" * 60)
print(f"Sample size: n = {n}")
print(f"Degrees of freedom: df = {df}")
print(f"Sample mean: xÃÑ = {x_bar:.4f} tons/hectare")
print(f"Sample SD: s = {s:.4f}")
print(f"Standard Error: SE = s/‚àön = {se:.4f}")
print(f"\nCRITICAL VALUES (95% confidence):")
print(f"  t* (df={df}) = {t_star:.4f}")
print(f"  z* = {z_star:.4f}")
print(f"  Difference: t* is {(t_star/z_star - 1)*100:.1f}% larger")
print(f"\n95% CONFIDENCE INTERVALS:")
print(f"  Using t-distribution: [{ci_t[0]:.4f}, {ci_t[1]:.4f}]")
print(f"  Using z-distribution: [{ci_z[0]:.4f}, {ci_z[1]:.4f}]")
print(f"\nüí° The t-distribution gives a slightly wider interval")
print(f"   to account for uncertainty in estimating œÉ with s")

In [None]:
# üìä Visualization 3: t vs z distributions

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: PDF comparison
x = np.linspace(-4, 4, 200)
dfs = [5, 10, 30, 100]
colors = plt.cm.viridis(np.linspace(0.2, 0.8, len(dfs)))

for df, color in zip(dfs, colors):
    axes[0].plot(x, stats.t.pdf(x, df), color=color, linewidth=2, 
                 label=f't (df={df})', alpha=0.7)

axes[0].plot(x, stats.norm.pdf(x), 'r--', linewidth=2.5, label='z (Normal)')
axes[0].set_xlabel('Value', fontsize=11)
axes[0].set_ylabel('Density', fontsize=11)
axes[0].set_title('t-distribution vs Normal Distribution', fontsize=12, fontweight='bold')
axes[0].legend(fontsize=9)
axes[0].grid(True, alpha=0.3)

# Right: Critical values convergence
df_range = range(1, 101)
t_crits = [stats.t.ppf(0.975, df) for df in df_range]
z_crit = stats.norm.ppf(0.975)

axes[1].plot(df_range, t_crits, 'b-', linewidth=2, label='t* (df)')
axes[1].axhline(z_crit, color='r', linestyle='--', linewidth=2, label=f'z* = {z_crit:.3f}')
axes[1].axvline(30, color='green', linestyle=':', linewidth=1.5, alpha=0.7,
                label='df=30 (rule of thumb)')
axes[1].set_xlabel('Degrees of Freedom (df)', fontsize=11)
axes[1].set_ylabel('Critical Value (95% CI)', fontsize=11)
axes[1].set_title('Convergence: t* ‚Üí z* as df Increases', fontsize=12, fontweight='bold')
axes[1].legend(fontsize=9)
axes[1].grid(True, alpha=0.3)
axes[1].set_xlim(0, 100)

plt.suptitle('Understanding the t-Distribution üìä', fontsize=14, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()

print("\nüí° Key Observations:")
print("   - t-distribution has heavier tails than normal (more probability in extremes)")
print("   - As df increases, t-distribution ‚Üí normal distribution")
print("   - For df ‚â• 30, t and z are practically identical")
print("   - Always use t when œÉ is unknown (safest choice)")

In [None]:
# üìè Effect of sample size on CI width
# Demonstrate that CI width ‚àù 1/‚àön

sample_sizes = [10, 25, 50, 100, 200, 500]
ci_widths = []
theoretical_widths = []

for n in sample_sizes:
    # Take a sample
    sample = np.random.choice(population, size=n, replace=False)
    
    # Calculate 95% CI
    x_bar = sample.mean()
    s = sample.std(ddof=1)
    se = s / np.sqrt(n)
    t_star = stats.t.ppf(0.975, n-1)
    margin = t_star * se
    
    width = 2 * margin
    ci_widths.append(width)
    
    # Theoretical width (assuming œÉ known)
    theoretical_width = 2 * 1.96 * true_sigma / np.sqrt(n)
    theoretical_widths.append(theoretical_width)

print("üìè Effect of Sample Size on CI Width:")
print("=" * 60)
print(f"{'n':<8} {'CI Width':<12} {'Theoretical':<15} {'Improvement'}")
print("-" * 60)

for idx, n in enumerate(sample_sizes):
    if idx == 0:
        improvement = "baseline"
    else:
        improvement = f"{ci_widths[0]/ci_widths[idx]:.2f}√ó narrower"
    
    print(f"{n:<8} {ci_widths[idx]:<12.4f} {theoretical_widths[idx]:<15.4f} {improvement}")

print("\nüí° To cut CI width in half, you need 4√ó the sample size!")
print("   (Because SE ‚àù 1/‚àön)")

In [None]:
# üìä Visualization 4: CI width vs sample size

plt.figure(figsize=(12, 6))

# Plot CI width vs n
plt.plot(sample_sizes, ci_widths, 'bo-', linewidth=2, markersize=10, 
         label='Observed CI Width', alpha=0.7)
plt.plot(sample_sizes, theoretical_widths, 'r--', linewidth=2, 
         label='Theoretical (œÉ known)', alpha=0.7)

# Add reference line showing 1/‚àön relationship
reference = [ci_widths[0] * np.sqrt(sample_sizes[0]/n) for n in sample_sizes]
plt.plot(sample_sizes, reference, 'g:', linewidth=2, alpha=0.5,
         label='1/‚àön reference')

plt.xlabel('Sample Size (n)', fontsize=12)
plt.ylabel('95% CI Width', fontsize=12)
plt.title('CI Width Decreases with ‚àön üìè', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)

# Add annotations
for i, n in enumerate(sample_sizes[::2]):
    idx = i * 2
    plt.annotate(f'n={n}\nwidth={ci_widths[idx]:.2f}', 
                 xy=(n, ci_widths[idx]), 
                 xytext=(n, ci_widths[idx] + 0.1),
                 fontsize=9, ha='center',
                 bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.6))

plt.tight_layout()
plt.show()

print("\nüí° Key Insight:")
print("   - CI width is proportional to 1/‚àön")
print("   - Diminishing returns: Going from n=10 to n=100 helps a lot")
print("   - But going from n=100 to n=1000 helps much less")

---

## 4. CI for Proportions üìä

### When to Use:

Estimating population proportion (e.g., germination rate, disease incidence, defect rate)

### Formula:

$$
\hat{p} \pm z^* \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}
$$

Where:
- pÃÇ = sample proportion
- SE = ‚àö[pÃÇ(1-pÃÇ)/n]

### Requirements:

- npÃÇ ‚â• 10 and n(1-pÃÇ) ‚â• 10 (for normal approximation)

### Example:

Out of 200 seeds, 176 germinated ‚Üí pÃÇ = 0.88

What's the 95% CI for germination rate?

---

In [None]:
# üå± CI for proportion: Seed germination rate

# Data
n_seeds = 200
n_germinated = 176
p_hat = n_germinated / n_seeds

# Check requirements
check1 = n_seeds * p_hat
check2 = n_seeds * (1 - p_hat)

print("üå± Confidence Interval for Seed Germination Rate:")
print("=" * 60)
print(f"Sample size: n = {n_seeds} seeds")
print(f"Germinated: {n_germinated} seeds")
print(f"Sample proportion: pÃÇ = {p_hat:.4f} ({p_hat*100:.1f}%)")
print(f"\nRequirement checks (need both ‚â• 10):")
print(f"  npÃÇ = {check1:.1f} ‚úì" if check1 >= 10 else f"  npÃÇ = {check1:.1f} ‚úó")
print(f"  n(1-pÃÇ) = {check2:.1f} ‚úì" if check2 >= 10 else f"  n(1-pÃÇ) = {check2:.1f} ‚úó")

# Calculate 95% CI
se = np.sqrt(p_hat * (1 - p_hat) / n_seeds)
z_star = stats.norm.ppf(0.975)
margin = z_star * se

ci_lower = p_hat - margin
ci_upper = p_hat + margin

print(f"\n95% CONFIDENCE INTERVAL:")
print(f"  Standard Error: SE = ‚àö[pÃÇ(1-pÃÇ)/n] = {se:.4f}")
print(f"  Margin of Error: z* √ó SE = {margin:.4f}")
print(f"  CI: [{ci_lower:.4f}, {ci_upper:.4f}]")
print(f"  CI: [{ci_lower*100:.1f}%, {ci_upper*100:.1f}%]")
print(f"\nüí° Interpretation:")
print(f"   We are 95% confident that the true germination rate")
print(f"   is between {ci_lower*100:.1f}% and {ci_upper*100:.1f}%")

In [None]:
# üìä Visualization 5: CI for proportion on number line

plt.figure(figsize=(12, 4))

# Number line
plt.plot([0, 1], [0, 0], 'k-', linewidth=2, alpha=0.3)

# CI as thick line
plt.plot([ci_lower, ci_upper], [0, 0], 'b-', linewidth=8, alpha=0.6, 
         label=f'95% CI: [{ci_lower*100:.1f}%, {ci_upper*100:.1f}%]')

# Point estimate
plt.scatter([p_hat], [0], s=300, c='red', marker='D', zorder=5, 
            edgecolors='black', linewidths=2, label=f'pÃÇ = {p_hat*100:.1f}%')

# Endpoints
plt.scatter([ci_lower, ci_upper], [0, 0], s=200, c='blue', marker='|', 
            zorder=4, linewidths=3)

# Annotations
plt.text(p_hat, 0.08, f'Sample: {n_germinated}/{n_seeds}\ngerminated', 
         ha='center', fontsize=10,
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

plt.xlabel('Germination Rate', fontsize=12)
plt.xticks(np.arange(0, 1.1, 0.1), [f'{x*100:.0f}%' for x in np.arange(0, 1.1, 0.1)])
plt.yticks([])
plt.title('95% Confidence Interval for Germination Rate üå±', 
          fontsize=14, fontweight='bold')
plt.legend(fontsize=11, loc='upper left')
plt.xlim(0.75, 0.98)
plt.ylim(-0.15, 0.15)
plt.grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

print("\nüí° For agricultural decision-making:")
print(f"   If you need ‚â•85% germination rate, you can be confident this")
print(f"   seed lot meets the requirement (lower bound = {ci_lower*100:.1f}%)")

---

## 5. Factors Affecting CI Width üìè

Three main factors determine confidence interval width:

### 1. Sample Size (n) ‚≠ê

- Larger n ‚Üí Smaller SE ‚Üí **Narrower CI** (more precise)
- Width ‚àù 1/‚àön
- Most controllable factor!

### 2. Confidence Level ‚≠ê

- Higher confidence ‚Üí Larger critical value ‚Üí **Wider CI**
- 90% CI narrower than 95% CI narrower than 99% CI
- Trade-off: confidence vs precision

### 3. Population Variability (œÉ) ‚≠ê

- More variability ‚Üí Larger SE ‚Üí **Wider CI**
- Can't control, but can measure

### Formula Breakdown:

$$
\text{Width} = 2 \times z^* \times \frac{\sigma}{\sqrt{n}}
$$

- z* depends on confidence level
- œÉ depends on population variability
- ‚àön is the sample size factor

---

In [None]:
# üî¨ Systematic exploration of factors affecting CI width

# Base case
base_n = 50
base_conf = 0.95
base_sigma = 0.8

# Calculate base width
base_z = stats.norm.ppf((1 + base_conf) / 2)
base_width = 2 * base_z * base_sigma / np.sqrt(base_n)

print("üî¨ Factors Affecting Confidence Interval Width:")
print("=" * 60)
print(f"BASE CASE: n={base_n}, confidence={base_conf*100:.0f}%, œÉ={base_sigma}")
print(f"Base CI width: {base_width:.4f}")

# Factor 1: Sample size
print(f"\n1. EFFECT OF SAMPLE SIZE (confidence={base_conf*100:.0f}%, œÉ={base_sigma}):")
for n in [25, 50, 100, 200]:
    width = 2 * base_z * base_sigma / np.sqrt(n)
    ratio = width / base_width
    print(f"   n={n:3d}: width={width:.4f} ({ratio:.2f}√ó base width)")

# Factor 2: Confidence level
print(f"\n2. EFFECT OF CONFIDENCE LEVEL (n={base_n}, œÉ={base_sigma}):")
for conf in [0.90, 0.95, 0.99]:
    z = stats.norm.ppf((1 + conf) / 2)
    width = 2 * z * base_sigma / np.sqrt(base_n)
    ratio = width / base_width
    print(f"   {conf*100:.0f}%: width={width:.4f} ({ratio:.2f}√ó base width)")

# Factor 3: Population variability
print(f"\n3. EFFECT OF VARIABILITY (n={base_n}, confidence={base_conf*100:.0f}%):")
for sigma in [0.4, 0.8, 1.2, 1.6]:
    width = 2 * base_z * sigma / np.sqrt(base_n)
    ratio = width / base_width
    print(f"   œÉ={sigma:.1f}: width={width:.4f} ({ratio:.2f}√ó base width)")

print("\nüí° Key Insights:")
print("   - Doubling n cuts width by ‚àö2 ‚âà 1.41 (diminishing returns)")
print("   - Higher confidence ‚Üí wider interval (trade-off)")
print("   - More variable population ‚Üí wider interval (can't control)")

In [None]:
# üìä Visualization 6: Three-panel showing each factor's effect

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Panel 1: Sample size effect
sample_sizes_viz = np.arange(10, 201, 5)
widths_n = [2 * base_z * base_sigma / np.sqrt(n) for n in sample_sizes_viz]

axes[0].plot(sample_sizes_viz, widths_n, 'b-', linewidth=2)
axes[0].scatter([base_n], [base_width], s=200, c='red', marker='*', 
                zorder=5, edgecolors='black', linewidths=1.5,
                label=f'Base: n={base_n}')
axes[0].set_xlabel('Sample Size (n)', fontsize=11)
axes[0].set_ylabel('CI Width', fontsize=11)
axes[0].set_title('Effect of Sample Size\n(larger n ‚Üí narrower CI)', 
                  fontsize=11, fontweight='bold')
axes[0].legend(fontsize=9)
axes[0].grid(True, alpha=0.3)

# Panel 2: Confidence level effect
confs = np.linspace(0.80, 0.99, 50)
widths_conf = [2 * stats.norm.ppf((1 + c) / 2) * base_sigma / np.sqrt(base_n) 
               for c in confs]

axes[1].plot(confs * 100, widths_conf, 'g-', linewidth=2)
axes[1].scatter([base_conf * 100], [base_width], s=200, c='red', marker='*', 
                zorder=5, edgecolors='black', linewidths=1.5,
                label=f'Base: {base_conf*100:.0f}%')
axes[1].set_xlabel('Confidence Level (%)', fontsize=11)
axes[1].set_ylabel('CI Width', fontsize=11)
axes[1].set_title('Effect of Confidence Level\n(higher confidence ‚Üí wider CI)', 
                  fontsize=11, fontweight='bold')
axes[1].legend(fontsize=9)
axes[1].grid(True, alpha=0.3)

# Panel 3: Variability effect
sigmas = np.linspace(0.2, 2.0, 50)
widths_sigma = [2 * base_z * s / np.sqrt(base_n) for s in sigmas]

axes[2].plot(sigmas, widths_sigma, 'orange', linewidth=2)
axes[2].scatter([base_sigma], [base_width], s=200, c='red', marker='*', 
                zorder=5, edgecolors='black', linewidths=1.5,
                label=f'Base: œÉ={base_sigma}')
axes[2].set_xlabel('Population Std Dev (œÉ)', fontsize=11)
axes[2].set_ylabel('CI Width', fontsize=11)
axes[2].set_title('Effect of Variability\n(higher œÉ ‚Üí wider CI)', 
                  fontsize=11, fontweight='bold')
axes[2].legend(fontsize=9)
axes[2].grid(True, alpha=0.3)

plt.suptitle('Factors Affecting Confidence Interval Width üìè', 
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print("\nüí° Which factor can you control?")
print("   - Sample size (n): YES! ‚úì Most practical way to get narrower CIs")
print("   - Confidence level: YES, but involves trade-offs")
print("   - Variability (œÉ): NO, it's a property of the population")

---

## 6. One-Sided Confidence Intervals üìä

### When to Use:

Sometimes we only care about one direction:

- **Upper bound**: "Is the defect rate less than 5%?"
- **Lower bound**: "Is yield greater than 4.5 tons/hectare?"

### One-Sided CI:

**Lower bound** ("at least"):
$$
\bar{x} - z^* \times SE < \mu
$$

**Upper bound** ("at most"):
$$
\mu < \bar{x} + z^* \times SE
$$

### Critical Values:

For 95% one-sided CI: z* = 1.645 (not 1.96!)

### Example:

Pesticide effectiveness: We want to show it reduces pests by **at least** X%

---

In [None]:
# üêõ One-sided CI: Pesticide effectiveness
# Question: What's the minimum reduction in pests we can be confident about?

# Data: Pest reduction (percentage)
np.random.seed(42)
n = 40
pest_reduction = np.random.normal(65, 12, n)  # Mean 65% reduction, SD 12%

x_bar = pest_reduction.mean()
s = pest_reduction.std(ddof=1)
se = s / np.sqrt(n)

# Two-sided 95% CI
t_two_sided = stats.t.ppf(0.975, n-1)
ci_two_lower = x_bar - t_two_sided * se
ci_two_upper = x_bar + t_two_sided * se

# One-sided 95% CI (lower bound)
t_one_sided = stats.t.ppf(0.95, n-1)  # 0.95, not 0.975!
ci_one_lower = x_bar - t_one_sided * se

print("üêõ Pesticide Effectiveness - One-Sided CI:")
print("=" * 60)
print(f"Sample size: n = {n} treated fields")
print(f"Mean pest reduction: {x_bar:.2f}%")
print(f"Sample SD: {s:.2f}%")
print(f"Standard Error: {se:.2f}%")
print(f"\nCRITICAL VALUES (95% confidence):")
print(f"  Two-sided: t* = {t_two_sided:.3f}")
print(f"  One-sided: t* = {t_one_sided:.3f}")
print(f"\n95% TWO-SIDED CI:")
print(f"  [{ci_two_lower:.2f}%, {ci_two_upper:.2f}%]")
print(f"  Interpretation: Œº is between {ci_two_lower:.1f}% and {ci_two_upper:.1f}%")
print(f"\n95% ONE-SIDED CI (LOWER BOUND):")
print(f"  {ci_one_lower:.2f}% < Œº")
print(f"  Interpretation: Reduction is AT LEAST {ci_one_lower:.1f}%")
print(f"\nüí° Use one-sided when you only care about one direction!")
print(f"   Here: We want to guarantee MINIMUM effectiveness")

In [None]:
# üìä Visualization 7: One-sided vs two-sided CI

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Two-sided CI
axes[0].plot([ci_two_lower, ci_two_upper], [0, 0], 'b-', linewidth=8, alpha=0.6)
axes[0].scatter([x_bar], [0], s=300, c='red', marker='D', zorder=5, 
                edgecolors='black', linewidths=2)
axes[0].scatter([ci_two_lower, ci_two_upper], [0, 0], s=200, c='blue', 
                marker='|', zorder=4, linewidths=3)
axes[0].set_xlabel('Pest Reduction (%)', fontsize=11)
axes[0].set_yticks([])
axes[0].set_title('Two-Sided 95% CI\n(estimate range)', 
                  fontsize=12, fontweight='bold')
axes[0].set_xlim(50, 80)
axes[0].grid(True, alpha=0.3, axis='x')
axes[0].text(x_bar, 0.08, f'[{ci_two_lower:.1f}%, {ci_two_upper:.1f}%]',
             ha='center', fontsize=10,
             bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.8))

# Right: One-sided CI (lower bound)
axes[1].arrow(ci_one_lower, 0, 15, 0, head_width=0.04, head_length=1.5, 
              fc='green', ec='darkgreen', linewidth=2, alpha=0.6)
axes[1].scatter([x_bar], [0], s=300, c='red', marker='D', zorder=5, 
                edgecolors='black', linewidths=2)
axes[1].scatter([ci_one_lower], [0], s=200, c='green', marker='|', 
                zorder=4, linewidths=3)
axes[1].set_xlabel('Pest Reduction (%)', fontsize=11)
axes[1].set_yticks([])
axes[1].set_title('One-Sided 95% CI (Lower Bound)\n(minimum guarantee)', 
                  fontsize=12, fontweight='bold')
axes[1].set_xlim(50, 80)
axes[1].grid(True, alpha=0.3, axis='x')
axes[1].text(ci_one_lower + 7, 0.08, f'‚â• {ci_one_lower:.1f}%',
             ha='center', fontsize=11, fontweight='bold',
             bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.8))

plt.suptitle('One-Sided vs Two-Sided Confidence Intervals üìä', 
             fontsize=14, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()

print("\nüí° When to use one-sided:")
print("   - Regulatory compliance: 'At most X% defects'")
print("   - Minimum performance: 'At least Y% effectiveness'")
print("   - Only care about one direction of deviation")
print(f"\nüìä For this pesticide:")
print(f"   We can claim with 95% confidence that it reduces pests by")
print(f"   AT LEAST {ci_one_lower:.1f}%")

---

## 7. Machine Learning Connection ‚≠ê‚≠ê‚≠ê

### Always Report Model Performance with CIs!

**Bad Practice**: "Model accuracy = 85%"

**Good Practice**: "Model accuracy = 85% ¬± 2% (95% CI: [83%, 87%])"

### Why This Matters:

1. **Quantifies Uncertainty**: Is 85% really different from 83%?
2. **Helps Model Comparison**: If CIs overlap, performance may not be truly different
3. **Reflects Sample Size**: Larger test sets ‚Üí narrower CIs ‚Üí more reliable estimates

### Methods:

1. **Normal Approximation** (for large test sets):
   $$
   \text{accuracy} \pm z^* \times \sqrt{\frac{\text{acc}(1-\text{acc})}{n}}
   $$

2. **Bootstrap CI** (more robust, works for any metric):
   - Resample test set with replacement
   - Calculate metric for each resample
   - Use percentiles of bootstrap distribution

### Cross-Validation:

K-fold CV gives you K scores ‚Üí calculate mean ¬± CI

---

In [None]:
# ü§ñ ML Example: Model performance with CI

from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score, train_test_split

# Generate agricultural classification data
np.random.seed(42)
X, y = make_classification(n_samples=500, n_features=10, n_informative=8,
                          n_redundant=2, random_state=42)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, 
                                                      random_state=42)

# Train model
model = LogisticRegression(random_state=42, max_iter=1000)
model.fit(X_train, y_train)

# Test set accuracy
y_pred = model.predict(X_test)
accuracy = (y_pred == y_test).mean()
n_test = len(y_test)

# Calculate 95% CI for accuracy (normal approximation)
se = np.sqrt(accuracy * (1 - accuracy) / n_test)
z_star = 1.96
margin = z_star * se
ci_lower = accuracy - margin
ci_upper = accuracy + margin

print("ü§ñ ML Model Performance with Confidence Interval:")
print("=" * 60)
print(f"Dataset: {X.shape[0]} agricultural observations, {X.shape[1]} features")
print(f"Train set: {len(X_train)} samples")
print(f"Test set: {len(X_test)} samples")
print(f"\nMODEL PERFORMANCE:")
print(f"  Test Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")
print(f"  Standard Error: {se:.4f}")
print(f"  95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]")
print(f"  95% CI: [{ci_lower*100:.2f}%, {ci_upper*100:.2f}%]")
print(f"\n‚úÖ PROPER REPORTING:")
print(f"   'Model accuracy = {accuracy*100:.1f}% ¬± {margin*100:.1f}%'")
print(f"   '95% CI: [{ci_lower*100:.1f}%, {ci_upper*100:.1f}%]'")
print(f"\n‚ùå IMPROPER REPORTING:")
print(f"   'Model accuracy = {accuracy*100:.1f}%' (no uncertainty!)")

In [None]:
# üîÑ Cross-validation with confidence intervals

# Perform 10-fold CV
cv_scores = cross_val_score(model, X, y, cv=10, scoring='accuracy')

# Calculate mean and CI
cv_mean = cv_scores.mean()
cv_std = cv_scores.std()
cv_se = cv_std / np.sqrt(len(cv_scores))

# 95% CI using t-distribution
t_star = stats.t.ppf(0.975, len(cv_scores) - 1)
cv_margin = t_star * cv_se
cv_ci_lower = cv_mean - cv_margin
cv_ci_upper = cv_mean + cv_margin

print("üîÑ Cross-Validation Results:")
print("=" * 60)
print(f"10-fold CV scores:")
print(f"  {cv_scores.round(4)}")
print(f"\nSUMMARY STATISTICS:")
print(f"  Mean accuracy: {cv_mean:.4f} ({cv_mean*100:.2f}%)")
print(f"  Std deviation: {cv_std:.4f}")
print(f"  Standard error: {cv_se:.4f}")
print(f"  95% CI: [{cv_ci_lower:.4f}, {cv_ci_upper:.4f}]")
print(f"  95% CI: [{cv_ci_lower*100:.2f}%, {cv_ci_upper*100:.2f}%]")
print(f"\nüí° This gives us confidence in the model's true performance!")

In [None]:
# üìä Visualization 8: CV scores distribution with CI

plt.figure(figsize=(12, 6))

# Histogram of CV scores
plt.hist(cv_scores, bins=8, alpha=0.7, color='steelblue', 
         edgecolor='black', density=True, label='CV Scores')

# Mark mean
plt.axvline(cv_mean, color='red', linestyle='-', linewidth=2, 
            label=f'Mean = {cv_mean:.3f}')

# Mark CI
plt.axvline(cv_ci_lower, color='green', linestyle='--', linewidth=1.5, alpha=0.7)
plt.axvline(cv_ci_upper, color='green', linestyle='--', linewidth=1.5, alpha=0.7)
plt.axvspan(cv_ci_lower, cv_ci_upper, alpha=0.2, color='green', 
            label=f'95% CI: [{cv_ci_lower:.3f}, {cv_ci_upper:.3f}]')

# Overlay normal distribution
x = np.linspace(cv_scores.min() - 0.02, cv_scores.max() + 0.02, 100)
plt.plot(x, stats.norm.pdf(x, cv_mean, cv_std), 'orange', linewidth=2, 
         alpha=0.7, label='Normal fit')

plt.xlabel('Accuracy', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.title('Cross-Validation Scores Distribution with 95% CI üîÑ', 
          fontsize=14, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)

# Add text box
textstr = f'10-Fold CV\nMean: {cv_mean:.3f}\n95% CI: [{cv_ci_lower:.3f}, {cv_ci_upper:.3f}]'
props = dict(boxstyle='round', facecolor='wheat', alpha=0.8)
plt.text(0.02, 0.98, textstr, transform=plt.gca().transAxes, fontsize=11,
         verticalalignment='top', bbox=props)

plt.tight_layout()
plt.show()

print("\nüí° Proper ML Reporting:")
print(f"   'Our model achieves {cv_mean*100:.1f}% accuracy'")
print(f"   '(95% CI: [{cv_ci_lower*100:.1f}%, {cv_ci_upper*100:.1f}%])'")
print(f"   'based on 10-fold cross-validation'")
print(f"\nüéØ This communicates both performance AND uncertainty!")

---

## Key Takeaways üéØ

### Confidence Intervals:

1. ‚úÖ **Definition**:
   - CI = Point Estimate ¬± Margin of Error
   - CI = Œ∏ÃÇ ¬± (Critical Value) √ó SE

2. ‚úÖ **Correct Interpretation** ‚≠ê‚≠ê:
   - ‚úì "95% of such intervals will capture the true parameter"
   - ‚úó "There's 95% probability Œº is in this interval"
   - The parameter is **fixed**, the interval is **random**

3. ‚úÖ **CI for Mean**:
   - œÉ known: xÃÑ ¬± z* √ó (œÉ/‚àön)
   - œÉ unknown: xÃÑ ¬± t* √ó (s/‚àön) ‚≠ê (use this!)

4. ‚úÖ **CI for Proportion**:
   - pÃÇ ¬± z* √ó ‚àö[pÃÇ(1-pÃÇ)/n]
   - Requires npÃÇ ‚â• 10 and n(1-pÃÇ) ‚â• 10

5. ‚úÖ **Factors Affecting Width**:
   - **Sample size** (n): Larger ‚Üí narrower (controllable!)
   - **Confidence level**: Higher ‚Üí wider (trade-off)
   - **Variability** (œÉ): Higher ‚Üí wider (not controllable)

6. ‚úÖ **ML Application** ‚≠ê‚≠ê‚≠ê:
   - **Always report model performance with CIs**
   - Quantifies uncertainty in performance estimates
   - Helps compare models properly
   - Bootstrap CI works for any metric

### Critical Formulas:

$$
\boxed{\text{CI for mean (œÉ unknown)} = \bar{x} \pm t^*_{df} \times \frac{s}{\sqrt{n}}}
$$

$$
\boxed{\text{Width} \propto \frac{1}{\sqrt{n}} \text{ (to halve width, need 4√ó sample size)}}
$$

---

## Next Steps üöÄ

**Coming Up Next: Bootstrap Methods** ‚≠ê‚≠ê

We've learned classical confidence intervals (based on formulas and distributional assumptions).

But what if:
- You want CI for a complex statistic (median, correlation, percentile)?
- Distributional assumptions don't hold?
- You want a modern, flexible approach?

**Answer: Bootstrap!**

In the next notebook, we'll learn:
- **Bootstrap resampling**: Computer-intensive inference
- **Bootstrap CIs**: Works for ANY statistic!
- **Connection to ML**: Bootstrap aggregating (bagging) ‚Üí Random Forests ‚≠ê‚≠ê

**This is where statistical inference meets modern ML!**

See you in **`05_bootstrap_methods.ipynb`**!

---

**Excellent work! You now know how to quantify uncertainty properly!** üìä‚ú®üåæ