# Central Limit Theorem and Confidence Intervals

In this section we explore an interesting mathematical phenomenon called the central limit theorem, where the normal distribution makes a surprising appearance when taking the means of samples. We will then leverage it for one practical application: confidence intervals. Along the way, we will also learn how to leverage the inverse CDF to find critical z-values.

For this section, let's bring in these dependencies. 

In [None]:
import random
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm 

## Central Limit Theorem

Let's reuse a functio below called `plot_central_limit_theorem()`. It is going to randomly generate a specified `sample_size` of numbers between 0 and 10 uniformly, meaning that any number is equally likely. There is also `sample_count` that will specify the number of samples. It will take the average value of each sample, and then plot those averages in a histogram. 

In [None]:
def plot_central_limit_theorem(sample_size, sample_count):
    # random numbers between 0.0 and 10
    X = [(sum([random.uniform(0.0, 10.0) for i in range(sample_size)]) / sample_size) \
                for _ in range(sample_count)]

    # plot histogram
    counts, bins = np.histogram(X)
    plt.stairs(counts, bins)
    plt.show()

Now let's say we get 50 samples, each with a size of 1. Notice how there's a predictable uniformity. 

In [None]:
plot_central_limit_theorem(sample_size=1, sample_count=1000)

But as we make the sample size larger, something interesting happens. Here is 1000 samples where each size is 2. 

In [None]:
plot_central_limit_theorem(sample_size=2, sample_count=1000)

Here is another where the sample size is 31. Note how at a sample size of 31, we have converged on a normal distribution shape. 

In [None]:
plot_central_limit_theorem(sample_size=31, sample_count=1000)

What we are discovering here is the **central limit theorem**, which states that interesting things happen when we start taking averages of samples, which start to form a normal distribution given large enough sample sizes. Here are the key points of the central limit theorem. 

* The mean of the sample means is equal to the population mean.
* If the population is normal, then the sample means will be normal.
* If the population is not normal, but the sample size is 31 or more, the sample means will still roughly form a normal distribution.
* The standard deviation of the sample means equals the population standard deviation divided by the square root of the sample size. 

$$
\Large{s = \frac{\sigma}{\sqrt{n}}}
$$

## Critical Z Values

Before we move on and apply the central limit theorem to confidence intervals, we need to cover one more concept. The **critical z-value** is a symmetrical middle range that provides a specific area under the bell curve. For example, let's say we want to find 95% of the center area under the bell curve below. What is the range of x-values that provide this? 

svg image

Of course, we need to use the inverse cumulative density function (the `ppf()` in SciPy) but we need to figure out which areas we need to look up deductively first. Let's hone in on the remaining area in the tails. Since the area under the entire curve is 1.0, that means there is .05 remaining in both tails collectively. 

svg image

That means each tail has .025 area. Therefore we would identify the x-value of the left tail at area .025. 

svg image

Let's just use a standard normal distribution with a mean of 0 and standard deviation of 1, and calculate this using SciPy. That left tail is cut off at $ x = -1.9599639845400545 $. 

In [None]:
mean=0
std=1
norm.ppf(.025,mean,std)

Now what is the x-value that cuts off the right tail? This is a bit trickier to deduct but think logically. That would be the .95 area in the center plus the .025 area in the left tail. That would mean looking up the x-value yielding area $ .975 $. 

svg image

Unsurprisingly, we get a symmetrical opposite x-value of $ 1.959963984540054 $ using the `ppf()` function.

In [None]:
norm.ppf(.975,mean,std)

So our z-value is as follows, which gives us the $ .95 $ area at the center of the curve. 

$$
Z = ±1.95996
$$

svg image