### Confidence Intervals

A confidence interval is a range of values, derived from a statistical estimation process, that is likely to contain the true value of an unknown population parameter

Lets use the `scipy.stats` module in Python to compute the confidence interval:

We'll use knowledge from various studies that suggest the average height of an adult male is approximately 175 cm with a standard deviation of about 7 cm. We'll generate a normally distributed sample of 100,1000, 10000 people's heights using these parameters. 

In this example, np.random.normal generates a random sample from a normal (Gaussian) distribution. stats.sem computes the standard error of the mean, and stats.t.interval computes the confidence interval.

Remember that the resulting confidence interval represents the range that we're 95% confident contains the true mean height of the population from which we're sampling. The larger the sample size, the narrower our confidence interval will be, all else being equal.

In [1]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# Set a seed for reproducibility
np.random.seed(0)

# Generate a random sample of 100 people's heights
heights = np.random.normal(175, 15, 100)


# Compute a 95% confidence interval for the mean
mean_height = np.mean(heights)
standard_error = stats.sem(heights)

confidence_interval = stats.t.interval(0.95, len(heights) - 1, loc=mean_height, scale=standard_error)

print(f"The 95% confidence interval for the mean height is {confidence_interval}")

The 95% confidence interval for the mean height is (172.88222231494893, 178.9120181510856)


In [2]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# Set a seed for reproducibility
np.random.seed(0)

# Generate a random sample of 1000 people's heights
heights = np.random.normal(175, 15, 1000)


# Compute a 95% confidence interval for the mean
mean_height = np.mean(heights)
standard_error = stats.sem(heights)

confidence_interval = stats.t.interval(0.95, len(heights) - 1, loc=mean_height, scale=standard_error)

print(f"The 95% confidence interval for the mean height is {confidence_interval}")

The 95% confidence interval for the mean height is (173.40193918189914, 175.240359593395)


In [3]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# Set a seed for reproducibility
np.random.seed(0)

# Generate a random sample of 10000 people's heights
heights = np.random.normal(175, 15, 10000)


# Compute a 95% confidence interval for the mean
mean_height = np.mean(heights)
standard_error = stats.sem(heights)

confidence_interval = stats.t.interval(0.95, len(heights) - 1, loc=mean_height, scale=standard_error)

print(f"The 95% confidence interval for the mean height is {confidence_interval}")

The 95% confidence interval for the mean height is (174.43310823303932, 175.01388016221273)
