### The Practical Impact of Bessel's Correction

Curious about why the variance formula divides by \( n-1 \) instead of n, I found the traditional explanation involving "degrees of freedom" unsatisfying. To gain a deeper understanding, I decided to investigate it myself. This experiment demonstrates the practical significance of **Bessel's correction** \( n-1 \) in the sample standard deviation formula. It highlights how **sample size** influences the accuracy of statistical estimators:

- **Small Sample Sizes**:  
  The correction is crucial, as using \( n-1 \) consistently outperforms \( n \). This helps mitigate the bias in estimating the population standard deviation.

- **Large Sample Sizes**:  
  The difference between \( n \) and \( n-1 \) becomes negligible, with \( n \) performing almost as well as \( n-1 \).

This behavior aligns with **statistical theory**:  
As the sample size increases, the bias corrected by \( n-1 \) diminishes.


In [12]:
import numpy as np
from random import sample

trials = 500
sample_sizes = [10, 20, 50, 100, 200]
results = []

# Run the experiment for each sample size
for sample_size in sample_sizes:
    count_n_minus_1_better = 0
    
    for _ in range(trials):
        population = np.random.randn(3000)
        population_std = population.std()
        
        abs_errors_n_minus_1 = []
        abs_errors_n = []
        
        for _ in range(100):  
            samples = np.array(sample(list(population), sample_size))
            sample_mean = samples.mean()
            st_dev_n_minus_1 = np.sqrt(np.sum((samples - sample_mean)**2) / (sample_size - 1))
            st_dev_n = np.sqrt(np.sum((samples - sample_mean)**2) / sample_size)
            
            # Compute the absolute errors
            abs_errors_n_minus_1.append(abs(st_dev_n_minus_1 - population_std))
            abs_errors_n.append(abs(st_dev_n - population_std))
        
        # Compare the average errors for this trial
        avg_error_n_minus_1 = np.mean(abs_errors_n_minus_1)
        avg_error_n = np.mean(abs_errors_n)
        
        # Increment count if n-1 performs better
        if avg_error_n_minus_1 < avg_error_n:
            count_n_minus_1_better += 1
    
    results.append((sample_size, count_n_minus_1_better / trials))

for sample_size, proportion in results:
    print(f"Sample size {sample_size}: n-1 performed better in {proportion:.2%} of trials.")

Sample size 10: n-1 performed better in 65.00% of trials.
Sample size 20: n-1 performed better in 60.00% of trials.
Sample size 50: n-1 performed better in 55.40% of trials.
Sample size 100: n-1 performed better in 50.20% of trials.
Sample size 200: n-1 performed better in 49.60% of trials.
