Skip to content

Latest commit

 

History

History
260 lines (183 loc) · 14.8 KB

v0.1.md

File metadata and controls

260 lines (183 loc) · 14.8 KB

NSIDES project data release v0.1

Data are available at http://tatonettilab.org/resources/nsides/.

This release includes data for drug side effects (OFFSIDES) and drug-drug pair side effects (TWOSIDES). These data represent an update from the data released with the publication of [1], to include adverse events reported to the FDA through FAERS up to and including 2014. We are releasing source code (notebooks and scripts) on GitHub, as well as data in the form both of database dumps and compressed .csv files.

[1] Tatonetti, Nicholas P., P. Ye Patrick, Roxana Daneshjou, and Russ B. Altman. "Data-driven prediction of drug effects and interactions." Science translational medicine 4, no. 125 (2012): 125ra31-125ra31. doi:10.1126/scitranslmed.3003377

Summary information

Number of Value
Drugs (≥ 1 exposure) 3,394
Adverse events types (≥ 1 occurrence) 17,552
Drug-event pairs 9,505,200
Significant* drug-event pairs 125,647
Drug-drug-event triplets 222,155,888
Significant* drug-drug-event triplets 5,729,992

* Significant determined by LOG(PRR) - 1.96 * PRR_error > LOG(2)

This is not filtered by OFFSIDES, meaning a drug-drug-event triplet can be significant even if one of the drugs is more significantly associated with the event by itself.

Notes on methods used to compute data

A contingency table can be drawn using exposed and unexposed cohorts produced by propensity score matching.

Had outcome Didn't have outcome
Drug exposed A B
Not drug exposed C D

Using these definitions,

and the error is

Several consequences of these definitions should be taken into account when inspecting the data.

  • PRR is NaN when both A and C are zero.
  • PRR is Inf when C is zero but A is greater than zero.
  • PRR is zero when A is zero and C is not zero.
  • PRR_s is Inf when A or C or both are zero.

Computational notes

Propensity score matching was used to account for covariates. OFFSIDES and TWOSIDES were handled in the same way.

The (drug exposure) propensity scores for were computed using 10 bootstrap iterations. For each iteration, only a fraction of reports were used to fit a logistic regression model that predicted drug exposure. Specifically, non-exposed reports were sampled in equal number to exposed reports. If fewer than 100 reports were exposed, then exposed reports (however many available) were used with 100 unexposed reports for propensity score calculation. Final propensity scores were calculated using an average across the 10 bootstrap iterations.

Propensity score matching used the following bins: [0, 0.2, 0.4, 0.6, 0.8, 1]. These bins were used to create case and control sets with similar compositions. For each bin, 10 times the number of cases were sampled from control (unexposed) reports. We only added cases or controls from bins that had at least one case and at least one control.

While some filtering was conducted for both OFFSIDES and TWOSIDES (A > 0 for both flat files and database dumps and PRR > 0.1 for flat files), TWOSIDES contains pairwise associations even for pairs where one of the drugs alone has a higher association score than the pair.

Flat files

Two flat files are made available for download: OFFSIDES.csv.xz and TWOSIDES.csv.xz. The files have the following schemata:

column name data type description
drug_rxnorn_id integer RxNorm CUI (RxCUI) of each drug
drug_concept_name string String name of each drug
condition_meddra_id integer MedDRA code of each (adverse event) condition
condition_concept_name string String name of each condition
A integer Number of reports prescribed the drug that had the condition
B integer Number of reports prescribed the drug that did not have the condition
C integer Number of reports not prescribed the drug that had the condition
D integer Number of reports not prescribed the drug that did not have the condition
PRR float Proportional reporting ratio*
PRR_error float Proportional reporting ratio error*
mean_reporting_frequency float A / (A + B)

Number of controls is determined using propensity-score matching with controls sampled at 10-to-1 relative to cases.

* See signal detection methods

Note that this table does not include any drug-condition combinations for which both A and C were zero. In other words, no information is presented for conditions that did not occur or for drugs that caused no adverse events.

Each drug pair in the TWOSIDES file occur each only once, since drug pair ordering is not meaningful. Each pair's order is determined by OMOP CDM IDs in the TWOSIDES database table, though this means there is no particular order between RxNorm IDs in this flat file.

column name data type description
drug_1_rxnorm_id integer RxNorm CUI (RxCUI) of the first drug
drug_1_concept_name string String name of the first drug
drug_2_rxnorm_id integer RxNorm CUI (RxCUI) of the second drug
drug_2_concept_name string String name of the second drug
condition_meddra_id integer MedDRA code of each (adverse event) condition
condition_concept_name string String name of each condition
A integer Number of reports prescribed the drug that had the condition
B integer Number of reports prescribed the drug that did not have the condition
C integer Number of reports not prescribed the drug that had the condition
D integer Number of reports not prescribed the drug that did not have the condition
PRR float Proportional reporting ratio*
PRR_error float Proportional reporting ratio error*
mean_reporting_frequency float A / (A + B)

Number of controls is determined using propensity-score matching with controls sampled at 10-to-1 relative to cases.

* See signal detection methods

Note that this table does not include any drug-drug-condition triplets for which both A and C were zero. In other words, no information is presented for conditions that did not occur or for drug pairs that caused no adverse events.

Database dumps

The data contained in this release is a stored in a MySQL database called effect_nsides (MySQL Ver 14.14 Distrib 5.7.26, for Linux (x86_64)). The code used to build and populate the tables in this database are located in nb/4.insert_tables/.

The dump was created using

usr/bin/mysqldump -p$PASSWORD effect_nsides | gzip > effect_nsides-2019-11-13.sql.gz

To load the database dumps into a local MySQL database, use the following commands:

mysqladmin create effect_nsides
gunzip < effect_nsides-2019-11-13.sql.gz | mysql effect_nsides

For more information about loading from these dump files, see the MySQL documentation on this topic.

Table schemata

effect_nsides has the following tables: REPORT, CONDITION_CONCEPT, CONDITION_OCCURRENCE, DRUG_CONCEPT, DRUG_EXPOSURE, OFFSIDES, and TWOSIDES. These tables have the following schemata:

REPORT

column name data type description
report_id int Unique ID for a report
report_year int Year in which the report was received
person_age int Age of the affected person
person_sex char(1) Sex of the affected person

Note that both person_age and person_sex columns contain NULL values. person_sex has been coded to be one of three values: 'M', 'F', or 'U'. person_age is normalized to units of years, though some unreasonable values are clearly erroneous reports (eg. 1054, 869, 5200).

CONDITION_CONCEPT

column name data type description
condition_concept_id int OMOP CDM concept_id of each (adverse event) condition
condition_concept_name varchar(255) String name of each (adverse event) condition
condition_meddra_id int MedDRA code of each (adverse event) condition
condition_snomed_id int SNOMED-CT code of each (adverse event) condition

Note that the condition_snomed_id column does contain NULL values, as the map between MedDRA and SNOMED-CT is imperfect.

CONDITION_OCCURRENCE

column name data type description
report_id int Unique ID for a report
condition_concept_id int OMOP CDM concept_id of an (adverse event) condition

This table simply connects rows from REPORT with rows in CONDITION_CONCEPT.

DRUG_CONCEPT

column name data type description
drug_concept_id int OMOP CDM concept_id of each drug
drug_concept_name varchar(255) String name of each drug
rxnorm_concept_id int RxNorm CUI (RxCUI) of each drug
drugbank_concept_id varchar(255) DrugBank Accession Number (DB*) of each drug
chebi_concept_id int ChEBI ID of each drug

Note that drugbank_concept_id and chebi_concept_id both contain NULL values, as the maps among these terminologies are imperfect.

DRUG_EXPOSURE

column name data type description
report_id int Unique ID for a report
drug_concept_id int OMOP CDM concept_id of a drug

This table simply connects rows from REPORT with rows in DRUG_CONCEPT.

OFFSIDES

column name data type description
drug_concept_id int OMOP CDM concept_id of a drug
condition_concept_id int OMOP CDM concept_id of an (adverse event) condition
A integer Number of reports prescribed the drug that had the condition
B integer Number of reports prescribed the drug that did not have the condition
C integer Number of reports not prescribed the drug that had the condition
D integer Number of reports not prescribed the drug that did not have the condition
PRR float Proportional reporting ratio*
PRR_error float Proportional reporting ratio error*
mean_reporting_frequency float A / (A + B)

Number of controls is determined using propensity-score matching with controls sampled at 10-to-1 relative to cases.

* See signal detection methods

Note that this table does not include any drug-condition combinations for which both A and C were zero. In other words, no information is presented for conditions that did not occur or for drugs that caused no adverse events.

TWOSIDES

column name data type description
drug_concept_id_1 int OMOP CDM concept_id of first drug
drug_concept_id_2 int OMOP CDM concept_id of second drug
condition_concept_id int OMOP CDM concept_id of an (adverse event) condition
A integer Number of reports prescribed the drug that had the condition
B integer Number of reports prescribed the drug that did not have the condition
C integer Number of reports not prescribed the drug that had the condition
D integer Number of reports not prescribed the drug that did not have the condition
PRR float Proportional reporting ratio*
PRR_error float Proportional reporting ratio error*
mean_reporting_frequency float A / (A + B)

Number of controls is determined using propensity-score matching with controls sampled at 10-to-1 relative to cases.

* See signal detection methods

Note that this table does not include any drug-drug-condition triplets for which both A and C were zero. In other words, no information is presented for conditions that did not occur or for drug pairs that caused no adverse events.

Questions

This work is the result of efforts by a number of people. Questions or comments are best made by opening an issue on the GitHub repository. Additionally, as this work is the result of a rotation project in the Tatonetti Lab at Columbia University, questions can also be sent to Michael Zietz (zietzm) or Nicholas Tatonetti directly.