Here, we flip a coin n times and we observe h heads. We want to know whether the coin is fair (null hypothesis). This example is particularly simple yet quite useful for pedagogical purposes. Besides, it is the basis of many more complex methods.

We denote the Bernoulli distribution by B(q) with the unknown parameter q. You can refer to https://en.wikipedia.org/wiki/Bernoulli_distribution for more information.

A Bernoulli variable is:

0 (tail) with probability 1−q
1 (head) with probability q
Here are the steps required to conduct a simple statistical z-test:

1.  Let's suppose that after n=100 flips, we get h=61 heads. We choose a significance level of 0.05: is the coin fair or not? Our null hypothesis is: the coin is fair (q=1/2). We set these variables:

In [9]:
import numpy as np
import scipy.stats as stats

n = 100  # number of coin flips
h = 61  # number of heads
q = .5  # null-hypothesis of fair coin

2.  Let's compute the z-score, which is defined by the following formula (xbar is the estimated average of the distribution). We will explain this formula in the next section, How it works...

In [11]:
xbar = float(h) / n
z = (xbar - q) * np.sqrt(n / (q * (1 - q)))
# We don't want to display more than 4 decimals.
z = round(z,4)
z

2.2

Or we can use this neat function

z = np.abs(stats.zscore(df['column']))

3.  Now, from the z-score, we can compute the p-value as follows:

In [10]:
pval = 2 * (1 - stats.norm.cdf(z))
pval = round(pval,4)
pval

0.0278

4.  This p-value is less than 0.05, so we reject the null hypothesis and conclude that the coin is probably not fair.