# Notes

- There are 2 examples:
  - From the article 2019_Why, When and How to Adjust Your P Values_ (jafari.ansari-pour_2019)
  - https://stackoverflow.com/questions/25185205/calculating-adjusted-p-values-in-python
- To correct the p-value, there are several methods
  - Bonferroni (one-step correction)
  - Benjamini-Hochberg (FDR) (non-negative)
  - Benjaminini-Yekutieli (negative)
- In Python, there are `scipy` and `statsmodels` to calculate it.
- I roughly tried in below to see the results obtained using different py packages give the same outputs. So just use one for our need.

# Example from the article

- Article 2019_Why, When and How to Adjust Your P Values_ (jafari.ansari-pour_2019). They use R.
- There are Bonferroni and FDR

```R
P_value <- c(0.0001, 0.001, 0.006, 0.03, 0.095, 0.117, 0.234, 0.552, 0.751, 0.985).
```

```R
p.adjust (P_values, method="bonferroni")
## [1] 0.001 0.010 0.060 0.300 0.950 1.000 1.000 1.000 1.000 1.000
```


```R
p.adjust (P_values, method="fdr")
## [1] 0.001 0.005 0.02 0.075 0.19 0.195 ## [7] 0.334 0.690 0.834 0.985
```

In [13]:
# do benferroni using Python


from statsmodels.stats.multitest import multipletests

ps = [0.0001, 0.001, 0.006, 0.03, 0.095, 0.117, 0.234, 0.552, 0.751, 0.985]
multipletests(ps, method='bonferroni')

## [1] 0.001 0.010 0.060 0.300 0.950 1.000 1.000 1.000 1.000 1.000

(array([ True,  True, False, False, False, False, False, False, False,
        False]),
 array([0.001, 0.01 , 0.06 , 0.3  , 0.95 , 1.   , 1.   , 1.   , 1.   ,
        1.   ]),
 0.005116196891823743,
 0.005)

In [8]:
from scipy.stats import false_discovery_control as sp_fdc     # False Discovery Control

ps = [0.0001, 0.001, 0.006, 0.03, 0.095, 0.117, 0.234, 0.552, 0.751, 0.985]
sp_fdc(ps)

## [1] 0.001 0.005 0.02 0.075 0.19 0.195 
## [7] 0.334 0.690 0.834 0.985

array([0.001     , 0.005     , 0.02      , 0.075     , 0.19      ,
       0.195     , 0.33428571, 0.69      , 0.83444444, 0.985     ])

# Example from stackoverflow

- https://stackoverflow.com/questions/25185205/calculating-adjusted-p-values-in-python

## Use `scipy.false_discovery_control`

In [11]:
# [https://scipy.github.io/devdocs/reference/generated/scipy.stats.false_discovery_control.html](https://scipy.github.io/devdocs/reference/generated/scipy.stats.false_discovery_control.html)

from scipy.stats import false_discovery_control as sp_fdc     # False Discovery Control

ps = [0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344,
      0.0459, 0.3240, 0.4262, 0.5719, 0.6528, 0.7590, 1.000]
sp_fdc(ps)

# array([0.0015    , 0.003     , 0.0095    , 0.035625  , 0.0603    ,
#        0.06385714, 0.06385714, 0.0645    , 0.0765    , 0.486     ,
#        0.58118182, 0.714875  , 0.75323077, 0.81321429, 1.        ])

array([0.0015    , 0.003     , 0.0095    , 0.035625  , 0.0603    ,
       0.06385714, 0.06385714, 0.0645    , 0.0765    , 0.486     ,
       0.58118182, 0.714875  , 0.75323077, 0.81321429, 1.        ])

## Use `scipy`

In [12]:
from scipy.stats import rankdata

def fdr(p_vals):

    ranked_p_values = rankdata(p_vals)
    fdr = p_vals * (len(p_vals) / ranked_p_values)
    fdr[fdr > 1] = 1

    return fdr

fdr(ps)

# array([0.0015    , 0.003     , 0.0095    , 0.035625  , 0.0603    ,
#        0.06385714, 0.06385714, 0.0645    , 0.0765    , 0.486     ,
#        0.58118182, 0.714875  , 0.75323077, 0.81321429, 1.        ])

array([0.0015    , 0.003     , 0.0095    , 0.035625  , 0.0603    ,
       0.0695    , 0.06385714, 0.0645    , 0.0765    , 0.486     ,
       0.58118182, 0.714875  , 0.75323077, 0.81321429, 1.        ])

## Use `statsmodels`

- https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html

In [6]:
from statsmodels.stats.multitest import multipletests

multipletests(ps, method='fdr_bh')

# array([0.0015    , 0.003     , 0.0095    , 0.035625  , 0.0603    ,
#        0.06385714, 0.06385714, 0.0645    , 0.0765    , 0.486     ,
#        0.58118182, 0.714875  , 0.75323077, 0.81321429, 1.        ])

(array([ True,  True,  True,  True, False, False, False, False, False,
        False, False, False, False, False, False]),
 array([0.0015    , 0.003     , 0.0095    , 0.035625  , 0.0603    ,
        0.06385714, 0.06385714, 0.0645    , 0.0765    , 0.486     ,
        0.58118182, 0.714875  , 0.75323077, 0.81321429, 1.        ]),
 0.0034137129465903193,
 0.0033333333333333335)

In [5]:
from statsmodels.stats.multitest import fdrcorrection

fdrcorrection(ps, alpha=0.05, method='indep', is_sorted=False)

# array([0.0015    , 0.003     , 0.0095    , 0.035625  , 0.0603    ,
#        0.06385714, 0.06385714, 0.0645    , 0.0765    , 0.486     ,
#        0.58118182, 0.714875  , 0.75323077, 0.81321429, 1.        ])

(array([ True,  True,  True,  True, False, False, False, False, False,
        False, False, False, False, False, False]),
 array([0.0015    , 0.003     , 0.0095    , 0.035625  , 0.0603    ,
        0.06385714, 0.06385714, 0.0645    , 0.0765    , 0.486     ,
        0.58118182, 0.714875  , 0.75323077, 0.81321429, 1.        ]))