# Margin of Error

In sampling experiments (e.g. election polling, census data, etc.), oftentimes the data are presented as percentages.  We can think of these percentages as measured probabilites of a certain outcome in the experiment.

The question is:  What is the uncertainty in the measured probability?  That is, what is the uncertainty in the mean value of the probability?

The answer to this question depends on three things:

1.  What is the sample size, N?
2.  What is the probability of the outcome, p?
3.  How confident do we need to be in reporting our result (indicated by $\alpha$)?

The margin of error (i.e. the uncertainty in the mean) is defined by:

$MOE = z_\gamma \times \sqrt{\frac{\sigma^2}{N}}$

where $z_\gamma$ is the z value associated with the confidence level (1-$\alpha$) that we have chosen, and $\sigma^2$ is the variance of the measured probability distribution.

For a binomial or Bernouilli distribution, $\sigma^2 = p(1-p)$.

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

In [7]:
# Example:  On August 3rd, 2020, it was reported that of the 982 new COVID-19 cases in Virginia for that day, 43%
#           were in the Hampton Roads region.

p = 0.43
N = 982
alpha = 0.05

confidence_level = 1 - alpha

z_gamma = stats.norm.ppf(confidence_level+alpha/2) #We expect alpha/2 both above and below the confidence interval.

sigma2 = p*(1-p)

MOE = z_gamma * np.sqrt(sigma2/N)

print ("Measured probability = ",p," +/- ",MOE*p)

Measured probability =  0.43  +/-  0.0133147328885919


What value do we expect?  To estimate this, we would need to know the population of Virginia, and the population of Hampton Roads.  The former is 8.536 million, as of 2019, and the latter is 1.78 million, as of this year.

Thus, the expected probability, based only on population, is $p_{theory} = 0.208$.