# `OFFSIDES`

`OFFSIDES` is a table in the `effect_nsides` database that contains disproportionality statistics (PRR, PRR_error, A, B, C, D, and the mean reporting frequency) for single drug--single outcome relationships.

When computing the OFFSIDES data, it turns out that the drug IDs used were RxNorm CUIs.
For the database, I would prefer to have OMOP CDM `concept_id`s in the `OFFSIDES.drug_concept_id` field (similarly for TWOSIDES).

Once OFFSIDES was computed on AWS, I transferred the resulting file to `data/tables/offsides.csv.xz`.
This notebook simply loads that file and maps drug IDs to OMOP CDM `concept_id`s.

The schema of `OFFSIDES` is the following:

```mysql
CREATE TABLE OFFSIDES (
    drug_concept_id int
    condition_concept_id int
    prr float
    prr_error float
    a float
    b float 
    c float
    d float 
    mean_reporting_frequency float
)
```

These fields have the following meanings:

* `drug_concept_id` - OMOP CDM `concept_id` for the drug. Foreign key to `DRUG_CONCEPT.concept_id`.
* `condition_concept_id` - OMOP CDM `concept_id` for the condition. Foreign key to `CONDITION_CONCEPT.concept_id`.
* `prr` - Disproportionality statistic
* `prr_error` - Error term for PRR
* `a` - Number of drug-exposed reports having the condition
* `b` - Number of drug-exposed reports not having the condition
* `c` - Number of unexposed reports having the condition
* `d` - Number of unexposed reports not having the condition
* `mean_reporting_frequency` - ?

$$\mathrm{PRR} = \frac{\frac{A}{A + B}}{\frac{C}{C + D}}$$

$$\mathrm{PRR_{error}} = \sqrt{1 / A + 1 / C - 1 / (A + B) - 1 / (C + D)}$$

In [1]:
import pandas as pd

## Load OFFSIDES table

In [2]:
offsides = pd.read_csv('../../data/tables/offsides.csv.xz')

offsides.head(2)

Unnamed: 0,drug_id,outcome_id,A,B,C,D,PRR,PRR_error
0,33835,35104074,2.0,34.0,8.0,352.0,2.5,0.771002
1,33835,35104113,0.0,36.0,1.0,359.0,0.0,inf


## Load DRUG_CONCEPT table (for mapping RXCUI - OMOP)

In [3]:
drug_concept = pd.read_csv('../../data/tables/drug_concept.csv.xz')

drug_concept.head(2)

Unnamed: 0,concept_id,concept_name,rxnorm_concept_id,drugbank_concept_id,chebi_concept_id
0,19080523,"silicon dioxide, colloidal",314826,DB11132,30563.0
1,42903427,Aldosterone,1312358,DB04630,27584.0


In [4]:
rxnorm_to_concept_id = (
    drug_concept
    .set_index('rxnorm_concept_id')['concept_id']
    .to_dict()
)

## Apply mapping and save

In [5]:
offsides_omop = (
    offsides
    .assign(
        drug_concept_id=lambda df: df['drug_id'].map(rxnorm_to_concept_id),
    )
    .renma
    .filter(items=['drug_concept_id', 'condition_concept_id', 'prr', 'prr'])
)

offsides_omop.head(2)

Unnamed: 0,drug_id,outcome_id,A,B,C,D,PRR,PRR_error,drug_concept_id
0,33835,35104074,2.0,34.0,8.0,352.0,2.5,0.771002,19025693
1,33835,35104113,0.0,36.0,1.0,359.0,0.0,inf,19025693
