# Cohen's d Value
## Definition
- Cohens's $d$ value of sample populations $X$ and $Y$ can be defined as follows:

    $d:=\frac{m_X-m_Y}{\sqrt{\frac{n_Xs_X^2+n_Ys_Y^2}{n_X+n_Y}}}$
    
    where the means are $m_X$ or $m_Y$, sample standard deviations are $s_X$ or $s_Y$, and the samples sizes are $n_X$ or $n_Y$

In [1]:
from tools import cohens_d, gene_selection

## Documentation
- **Note**: [`EDA`](https://takemura-hgf.readthedocs.io/en/latest/jupyternb/eda.html) calls the functions below but they are not explicitly called in analysis codes
### 1. `cohens_d`
- arguments
    - `data`: `pandas.DataFrame`
    - `group`: `pandas.Series`
        - **Note**: these args expect DataFrames or Serieses subscribed in [`SuematsuData`](https://takemura-hgf.readthedocs.io/en/latest/jupyternb/suematsudata.html)
        - see **Examples** for more details
    - `regex`: `str`
        - argment for `pandas.DataFrame.filter`
        - pass str to that `pandas.DataFrame.filter` can filter groups of interests
        - see **Examples** for more details
    - `flip`: bool (default:False)
        - pass True to flip the assignment of $X$ or $Y$ to the given populations
- return: `pandas.Series` of Cohen's d values
### 2. `gene_selection`
- arguments
    - `data`: `pandas.DataFrame`
    - `group`: `pandas.Series`
        - **Note**: these args expect DataFrames or Serieses subscribed in [`SuematsuData`](https://takemura-hgf.readthedocs.io/en/latest/jupyternb/suematsudata.html)
        - see **Examples** for more details
    - `regex`: `str`
        - argment for `pandas.DataFrame.filter`
        - pass str to that `pandas.DataFrame.filter` can filter groups of interests
        - see **Examples** for more details
    - `d`: float (default:0.8)
        - Cohen's d value for the threshold.
        - upregulated genes will be defined to show larger Cohen's d values than the threshold (>`d`)
        - downregulated genes will be defined to show smaller Cohen's d values than the threshold (<`d`)
    - `neg`: bool (default:False)
        - pass True to retrun downregulated genes; otherwise, upregulated genes are returned
    - `flip`: bool (default:False)
        - pass True to flip the definition of upregulated/downregulated genes
- return: `pandas.Series` of upregulated/downregulated genes
---
## Examples
- data: `SuematsuData`
- comparison: day7-HGF+ vs day2-HGF+
    - **Note**: upregulated genes are enriched in day7-HGF+
    - `regex`: "HGF+"
- threshold: 0.8
    - `d`: 0.8

In [2]:
from tools import SuematsuData
data = SuematsuData()

print(f"""
data.data: {type(data.data)}
data.group: {type(data.group)}
""")


data.data: <class 'pandas.core.frame.DataFrame'>
data.group: <class 'pandas.core.series.Series'>



In [3]:
cohens_d(data=data.data, group=data.group, regex="HGF+")

Adcyap1        -1.109006
Adcyap1r1      -2.471432
Add2           -1.433537
Adra2c         -1.538979
Agt            -3.277344
                  ...   
LOC108353194   -3.710465
LOC108353205   -6.457637
LOC108353206   -2.336801
LOC108353207   -2.988864
LOC108353295   -4.645315
Length: 3355, dtype: float64

In [4]:
# upregulated genes
gene_selection(data=data.data, group=data.group, regex="HGF+", d=0.8)

S100a8          24.464911
Hba1            23.445783
Hba2            21.395015
Hbb             21.221507
S100a9          20.945858
                  ...    
Frem2            0.897932
Rmrp             0.893452
LOC100912347     0.890102
LOC686035        0.853276
Pf4              0.800051
Length: 242, dtype: float64

In [5]:
# downregulated genes
gene_selection(data=data.data, group=data.group, regex="HGF+", d=0.8, neg=True)

Rn45s          -30.924380
S100b          -22.132822
mt-Rnr2        -20.783350
Dusp1          -20.763104
Tubb4a         -20.751691
                  ...    
Pdzd7           -0.823694
LOC501317       -0.816093
Adprhl1         -0.815504
Kcnma1          -0.809355
LOC108349559    -0.805570
Length: 2990, dtype: float64