# The $t$-distribution and confidence intervals: an election example

Suppose you want to know the proportion of an electorate that will vote for a certain candidate. (For this example, we will assume there are only two candidates.)  The electorate is too large to ask everybody (this is a poll, not the actual election), so we take samples instead.  Suppose we take $N=10$  samples and the proportion of votes for our candidate (after removing blank/invalid responses) are as follows:

In [1]:
import numpy
x = numpy.array([0.53,0.56,0.47,0.50,0.59,0.42,0.65,0.62,0.53,0.56])

The sample mean $\bar{x}$ of the above samples is
$$\bar{x}=\frac{\sum_{i=1}^N x_i}{N}$$

In [2]:
m = numpy.mean(x)
m

0.5429999999999999

This is larger than 50 percent. But how sure are we that our candidate will actually win the election?

Assume that our samples come from a normal distribution with unknown mean and variance.  The sample mean is $\bar{x}$ which we computed above.  The **sample standard deviation** is
$$\sigma_x=\sqrt{\frac{\sum_{i=1}^N \left(x_1 - \bar{x}\right)^2}{N-1}}$$

In [3]:
sd = numpy.std(x,ddof=1)
sd

0.06896859188548558

Note that the "ddof" parameter above stands for "delta degrees of freedom", which corresponds to the "- 1" term in the above formula.

The **standard error of the mean** is
$$s_\bar{x}=\frac{\sigma_x}{\sqrt{N}}$$

In [4]:
import scipy.stats
sem = scipy.stats.sem(x) # note this defaults to using ddof = 1 when computing
                         # the sample standard deviation
sem

0.02180978373727412

The construction of a confidence interval is as follows:


1.   Construct a **Student's $t$-distribution** with $N-1$ degrees of freedom.
2.   Shift the distribution so that its new mean is $\bar{x}$.
3.   Scale the distribution about its mean $\bar{x}$ by a factor of
$s_\bar{x}$. A value of $x$ in the new distribution now corresponds to a value of $(x-\bar{x})/s_\bar{x}$ in the original distribution from Step 1.
4.   What interval has an area of 95% (or some other confidence level) of the entire $t$-distribution from Step 3?

All of this can be done in a single line of Python:



In [5]:
scipy.stats.t.interval(0.95,len(x)-1,loc=m,scale=sem)

(0.4936628415008933, 0.5923371584991066)

Therefore, we are not certain (at the 95% confidence level) whether our candidate will win the election, even though his or her expected vote share is greater than 50 percent.

Finally, note that due to how polling is conducted, there may be biases in the samples that will affect our result above, but we will ignore them here.