<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Fisher's-Exact-Test" data-toc-modified-id="Fisher's-Exact-Test-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Fisher's Exact Test</a></span></li><li><span><a href="#Chi-Square-Test-of-Independence" data-toc-modified-id="Chi-Square-Test-of-Independence-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Chi-Square Test of Independence</a></span></li><li><span><a href="#Wilcoxon-Signed-Rank-Test" data-toc-modified-id="Wilcoxon-Signed-Rank-Test-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Wilcoxon Signed-Rank Test</a></span></li><li><span><a href="#Mann–Whitney-U-test" data-toc-modified-id="Mann–Whitney-U-test-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Mann–Whitney U test</a></span></li><li><span><a href="#Welch's-T-Test" data-toc-modified-id="Welch's-T-Test-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Welch's T-Test</a></span></li><li><span><a href="#Paired-Student's-T-test" data-toc-modified-id="Paired-Student's-T-test-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Paired Student's T-test</a></span></li></ul></div>

In [1]:
%matplotlib inline
import numpy as np
import scipy as sp
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import IPython as ip
mpl.style.use('ggplot')
mpl.rc('font', family='Noto Sans CJK TC')
ip.display.set_matplotlib_formats('svg')

In [2]:
np.random.seed(20180701)

# Fisher's Exact Test

In [3]:
# sp.stats.f(...)
# # or
# from scipy import stats
# stats.f(...)

In [4]:
sp.stats.fisher_exact([
    # right-handed, left-handed
    [43, 9],  # men
    [44, 4],  # women
])
# -> (odds ratio, p-value)

(0.43434343434343436, 0.23915695682224306)

In [5]:
# # odds ratio (OR, 勝算比)
# (43/9) / (44/4)
# # -> 0.4343434343434343

# # odds (勝算)
# 43/9
# # -> 4.777777777777778

# # risk ratio (RR, relative risk, 相對風險)
# (9/(43+9)) / (4/(44+4))
# # -> 2.076923076923077

In [6]:
sp.stats.fisher_exact([
    # studying, no-studying
    [1, 8],  # men
    [7, 4],  # women
])

(0.07142857142857142, 0.028101929030721572)

In [7]:
# # odds ratio
# (1/8) / (7/4)
# # -> 0.07142857142857142

# Chi-Square Test of Independence

In [8]:
# recommend not to use Yates's correction
sp.stats.chi2_contingency([
    # right-handed, left-handed
    [43, 9],  # men
    [44, 4],  # women
], correction=False)
# -> (
#     chi-squared, p-value,
#     degrees of freedom, expected frequencies
# )

(1.7774150400145103, 0.1824670652605479, 1, array([[45.24,  6.76],
        [41.76,  6.24]]))

In [9]:
# expected frequency
(43+9)/(43+9+44+4) * (43+44)

45.24

In [10]:
sp.stats.chi2_contingency([
    # studying, no-studying
    [1, 8],  # men
    [7, 4],  # women
], correction=False)

(5.6902356902356885, 0.017059563200794218, 1, array([[3.6, 5.4],
        [4.4, 6.6]]))

In [11]:
# expected frequency
(1+8)/(1+8+7+4) * (1+7)

3.6

Since Fisher's exact test is based on the exact [hypergeometric](https://en.wikipedia.org/wiki/Hypergeometric_distribution) distribution (n draws without replacement), the p-values are the exact probabilities. The p-values of chi-squared tests are less than Fisher's exact tests, which imply the higher chances to get false positive results, in another word, worse.

However, Fisher's exact test requires much more computation, and chi-squared test approximates greatly when sample size is large, so choose between them by the sample size.

# Wilcoxon Signed-Rank Test

In [12]:
# https://en.wikipedia.org/wiki/Zipf%27s_law
#
# Zipf's law states that given some corpus of natural language utterances, the 
# frequency of any word is inversely proportional to its rank in the frequency 
# table.
#
# Like the discrete Pareto (power-law) distribution.
#
# the two groups are paired and different
group_ctl = sp.stats.zipf.rvs(loc=0, a=1.01, size=12)
group_exp = group_ctl*2  # some treatment effect
group_ctl, group_exp

(array([   2706094497551,                2,             6912,
                     456,              454,           782392,
                    2141,           528962,       1156649224,
               106258447, 2061749587818853,               47]),
 array([   5412188995102,                4,            13824,
                     912,              908,          1564784,
                    4282,          1057924,       2313298448,
               212516894, 4123499175637706,               94]))

In [13]:
name_pvalue_pairs = [
    ("Student's t-test", sp.stats.ttest_ind(group_ctl, group_exp).pvalue),
    ("Welch's t-test", sp.stats.ttest_ind(group_ctl, group_exp, equal_var=False).pvalue),
    ("Paired Student's t-rest", sp.stats.ttest_rel(group_ctl, group_exp).pvalue),
    ('Mann–Whitney U test', sp.stats.mannwhitneyu(group_ctl, group_exp).pvalue),
    ('Wilcoxon signed-rank test', sp.stats.wilcoxon(group_ctl, group_exp).pvalue),
]
name_pvalue_pairs.sort(key=lambda x: x[1])

for name, p_value in name_pvalue_pairs:
    print(f'{name:26} {p_value:.4f}  {p_value < 0.05}')

Wilcoxon signed-rank test  0.0022  True
Mann–Whitney U test        0.3325  False
Paired Student's t-rest    0.3381  False
Student's t-test           0.6586  False
Welch's t-test             0.6602  False


Only Wilcoxon signed-rank test is correct.

# Mann–Whitney U test

In [14]:
# the two groups are the same
group_ctl = [11, 22, 33, 44, 55, 66, 77]
group_exp = [11, 22, 33, 44, 55, 66, 7700]  # contains an outlier 

In [15]:
name_pvalue_pairs = [
    ("Student's t-test", sp.stats.ttest_ind(group_ctl, group_exp).pvalue),
    ("Welch's t-test", sp.stats.ttest_ind(group_ctl, group_exp, equal_var=False).pvalue),
    ("Paired Student's t-rest", sp.stats.ttest_rel(group_ctl, group_exp).pvalue),
    ('Mann–Whitney U test', sp.stats.mannwhitneyu(group_ctl, group_exp).pvalue),
    ('Wilcoxon signed-rank test', sp.stats.wilcoxon(group_ctl, group_exp).pvalue),
]
name_pvalue_pairs.sort(key=lambda x: x[1])

for name, p_value in name_pvalue_pairs:
    print(f'{name:26} {p_value:.4f}  {p_value < 0.05}')

Wilcoxon signed-rank test  0.3173  False
Student's t-test           0.3394  False
Paired Student's t-rest    0.3559  False
Welch's t-test             0.3582  False
Mann–Whitney U test        0.5000  False




All tests are correct, but Mann–Whitney U test shows the capacity against the outliers (the p-value is greatest).

# Welch's T-Test

In [16]:
# the two groups are the same
group_ctl = sp.stats.norm.rvs(loc=170, scale=5, size=100)
group_exp = sp.stats.norm.rvs(loc=170, scale=5, size=3)  # the only difference is the group size

In [17]:
name_pvalue_pairs = [
    ("Student's t-test", sp.stats.ttest_ind(group_ctl, group_exp).pvalue),
    ("Welch's t-test", sp.stats.ttest_ind(group_ctl, group_exp, equal_var=False).pvalue),
    # ("Paired Student's t-rest", sp.stats.ttest_rel(group_ctl, group_exp).pvalue),  # requires equal lens
    ('Mann–Whitney U test', sp.stats.mannwhitneyu(group_ctl, group_exp).pvalue),
    # ('Wilcoxon signed-rank test', sp.stats.wilcoxon(group_ctl, group_exp).pvalue),  # requires equal lens
]
name_pvalue_pairs.sort(key=lambda x: x[1])

for name, p_value in name_pvalue_pairs:
    print(f'{name:26} {p_value:.4f}  {p_value < 0.05}')

Student's t-test           0.1468  False
Mann–Whitney U test        0.2556  False
Welch's t-test             0.5563  False


All tests are correct, but Welch's t-test shows the capacity against the different group size (unequal variance) (the p-value is greatest).

# Paired Student's T-test

In [18]:
# the two groups are different
group_ctl = sp.stats.norm.rvs(loc=170, scale=5, size=100)
# ctl + norm.rvs(...) === ctl + some treatment effect
group_exp = group_ctl + sp.stats.norm.rvs(loc=1, scale=1, size=100)

In [19]:
name_pvalue_pairs = [
    ("Student's t-test", sp.stats.ttest_ind(group_ctl, group_exp).pvalue),
    ("Welch's t-test", sp.stats.ttest_ind(group_ctl, group_exp, equal_var=False).pvalue),
    ("Paired Student's t-rest", sp.stats.ttest_rel(group_ctl, group_exp).pvalue),
    ('Mann–Whitney U test', sp.stats.mannwhitneyu(group_ctl, group_exp).pvalue),
    ('Wilcoxon signed-rank test', sp.stats.wilcoxon(group_ctl, group_exp).pvalue),
]
name_pvalue_pairs.sort(key=lambda x: x[1])

for name, p_value in name_pvalue_pairs:
    print(f'{name:26} {p_value:.4f}  {p_value < 0.05}')

Paired Student's t-rest    0.0000  True
Wilcoxon signed-rank test  0.0000  True
Mann–Whitney U test        0.0428  True
Student's t-test           0.0993  False
Welch's t-test             0.0993  False


The independent tests can't (or hard to) detect the effect in the paired samples.