### Multiple hypothesis testing

In [1]:
import pandas as pd#Import the data samples
from multipy.data import neuhaus#Import the FWER methods 
from multipy.fwer import bonferroni, holm_bonferroni#Import the FDR methods (LSU is the other name for BH method)
from multipy.fdr import lsu

Let’s assume we have 15 features, and we already did our hypothesis testing for each feature.

In [2]:
pvals = neuhaus()
df = pd.DataFrame({'Features': ['Feature {}'.format(i) for i in range(1,len(pvals)+1  )], 'P-value':pvals})
# df

In [3]:
# Now, let’s try the Bonferroni Correction to our data sample
#Set the alpha level for your desired significant level
df['bonferroni'] = bonferroni(pvals, alpha = 0.05)
df

Unnamed: 0,Features,P-value,bonferroni
0,Feature 1,0.0001,True
1,Feature 2,0.0004,True
2,Feature 3,0.0019,True
3,Feature 4,0.0095,False
4,Feature 5,0.0201,False
5,Feature 6,0.0278,False
6,Feature 7,0.0298,False
7,Feature 8,0.0344,False
8,Feature 9,0.0459,False
9,Feature 10,0.324,False


With the function from MultiPy, we end up either with True or False results. True means we Reject the Null Hypothesis, while False, we Fail to Reject the Null Hypothesis.

From the Bonferroni Correction method, only three features are considered significant. Let’s try the Holm-Bonferroni method to see if there is any difference in the result.

In [5]:
df['holm_bonferroni'] = holm_bonferroni(pvals, alpha = 0.05)
df

Unnamed: 0,Features,P-value,bonferroni,benjamin_hochberg,holm_bonferroni
0,Feature 1,0.0001,True,True,True
1,Feature 2,0.0004,True,True,True
2,Feature 3,0.0019,True,True,True
3,Feature 4,0.0095,False,True,False
4,Feature 5,0.0201,False,False,False
5,Feature 6,0.0278,False,False,False
6,Feature 7,0.0298,False,False,False
7,Feature 8,0.0344,False,False,False
8,Feature 9,0.0459,False,False,False
9,Feature 10,0.324,False,False,False


No change at all in the result. It seems the conservative method FWER has restricted the significant result we could get. Let’s see if there is any difference if we use the BH method.

In [6]:
#set the q parameter to the FDR rate you want
df['benjamin_hochberg'] = lsu(pvals, q=0.05) # q = desired FDR
df

Unnamed: 0,Features,P-value,bonferroni,benjamin_hochberg,holm_bonferroni
0,Feature 1,0.0001,True,True,True
1,Feature 2,0.0004,True,True,True
2,Feature 3,0.0019,True,True,True
3,Feature 4,0.0095,False,True,False
4,Feature 5,0.0201,False,False,False
5,Feature 6,0.0278,False,False,False
6,Feature 7,0.0298,False,False,False
7,Feature 8,0.0344,False,False,False
8,Feature 9,0.0459,False,False,False
9,Feature 10,0.324,False,False,False


The less strict method FDR resulted in a different result compared to the FWER method. In this case, we have four significant features. The FDR is proven to laxer to find the features, after all.

If you want to learn more about the methods available for Multiple Hypothesis Correction, you might want to visit the MultiPy homepage.

In [7]:
from statsmodels.stats.multitest import multipletests

In [8]:
reject, p_value_corrected, sidak_corr, bonf_corr = multipletests(pvals, alpha=0.05, method='fdr_bh')

In [9]:
reject, p_value_corrected, sidak, bonferroni =  multipletests(pvals, alpha = 0.05, method='sidak')
df['sidak'] = reject
reject, p_value_corrected, sidak, bonferroni =  multipletests(pvals, alpha = 0.05, method='holm-sidak')
df['holm-sidak'] = reject

  pvals_corrected = -np.expm1(ntests * np.log1p(-pvals))
  np.log1p(-pvals))


In [10]:
df

Unnamed: 0,Features,P-value,bonferroni,benjamin_hochberg,holm_bonferroni,sidak,holm-sidak
0,Feature 1,0.0001,True,True,True,True,True
1,Feature 2,0.0004,True,True,True,True,True
2,Feature 3,0.0019,True,True,True,True,True
3,Feature 4,0.0095,False,True,False,False,False
4,Feature 5,0.0201,False,False,False,False,False
5,Feature 6,0.0278,False,False,False,False,False
6,Feature 7,0.0298,False,False,False,False,False
7,Feature 8,0.0344,False,False,False,False,False
8,Feature 9,0.0459,False,False,False,False,False
9,Feature 10,0.324,False,False,False,False,False



    reject — булевский массив длины 𝑚, в котором True — нулевую гипотезу можно отвергнуть и False — если нельзя
    pvals_corrected — массив длины 𝑚 со скорректированными p-value
    alphacSidak — поправка Шидака
    alphacBonf — поправка Бонферонни



Поправка Шидака


Как и в поправке Бонферонни, поправка Шидака корректирует $\alpha$ (уровни значимости для проверки единичных гипотез). Она также сохраняет $F W E R \leq \alpha$
Посчитаем, чему равна поправка Шидака. $P(V \leq 1)=1-P(V=0) \leq 1-\left(1-\alpha_{1}\right)^{m}=\alpha$, где $\alpha-$ заданный нами уровень значимости для семейства гипотез и $\alpha_{1}-$ искомый уровень значимости для проверки каждой единичной гипотезы.
Выразим $\alpha_{1}$ через $\alpha$ и получим $\alpha_{1}=1-(1-\alpha)^{1 / m} \mid$

Метод Шидака-Холма


Как и в предыдущем методе, где отметился Холм, используется итерационная корректировка р-value. Аналогично сортируем наши р-value по возрастанию и корректируем их согласно поправке Шидака: $\alpha_{1}=1-(1-\alpha)^{\frac{\pi}{m}}$
$$
\begin{array}{l}
\alpha_{i}=1-(1-\alpha)^{\frac{\alpha}{m-l+1}} \\
\ldots \\
\alpha_{m}=\alpha
\end{array}
$$
Обладает несколькими свойствами:
1. Контролирует FWER на уровне значимости $\alpha$, если статистики независимы в совокупности.
2. Если статистики независимы в совокупности, нельзя построить контролирующую FWER на уровне $\alpha$ процедуру мощнее, чем метод Шидака-Холма.
3. При больших $m$ мало отличается от метода Холма