## Dangers of Multiple T-Tests
Suppose that we own a chain of stores that sell ants, called VeryAnts. There are three different locations: A, B, and C. We want to know if the average ant sales over the past year are significantly different between the three locations.

At first, it seems that we could perform T-tests between each pair of stores.

We know that the p-value is the probability that we incorrectly reject the null hypothesis on each t-test. The more t-tests we perform, the more likely that we are to get a false positive, a Type I error.

For a p-value of 0.05, if the null hypothesis is true then the probability of obtaining a significant result is 1 – 0.05 = 0.95. When we run another t-test, the probability of still getting a correct result is 0.95 * 0.95, or 0.9025. That means our probability of making an error is now close to 10%! This error probability only gets bigger with the more t-tests we do.

In [1]:
from scipy.stats import ttest_ind
import numpy as np

a = np.genfromtxt("store_a.csv",  delimiter=",")
b = np.genfromtxt("store_b.csv",  delimiter=",")
c = np.genfromtxt("store_c.csv",  delimiter=",")

a_mean = np.mean(a)
b_mean = np.mean(b)
c_mean = np.mean(c)
a_std = np.std(a)
b_std = np.std(b)
c_std = np.std(c)

print "Store A mean:{} standard deviation:{}".format(a_mean,a_std)
print "Store B mean:{} standard deviation:{}".format(b_mean,b_std)
print "Store C mean:{} standard deviation:{}".format(c_mean,c_std)

# 2-Sample T-test
pstat, a_b_pval = ttest_ind(a, b)
pstat, a_c_pval = ttest_ind(a, c)
pstat, b_c_pval = ttest_ind(b, c)
print "Store A & B sales difference pval:{}".format(a_b_pval)
print "Store A & C sales difference pval:{}".format(a_c_pval)
print "Store B & C sales difference pval:{}".format(b_c_pval)

error_prob = 1- (0.95**3)

# 3 samples of ttest has increased the prob of error to 14.26%
print error_prob

Store A mean:58.349636084 standard deviation:14.7537040523
Store B mean:65.6262871356 standard deviation:14.7465644902
Store C mean:62.3611731859 standard deviation:15.0924585109
Store A & B sales difference pval:2.76676293987e-05
Store A & C sales difference pval:0.0210120516986
Store B & C sales difference pval:0.0598856352397
0.142625
