# Data Commons Pilot - TOPMed MESA Data Analysis demo

This demo shows some data visualization and data analysis procedures by interacting with the Gen3 Data Commons provided for the Data Commons Pilot (DCP) project. 

More specifically, the demo is focused on the TOPMed project entitled:

"Multi-Ethnic Study of Atherosclerosis (MESA)"
(https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000209.v13.p3)

## DATA EXPLORATION: High-level Python library based on GraphQL queries

Install required Python packages. Import library with pipeline functions and GraphQL queries as well as our credentials:

In [None]:
%matplotlib inline
!pip install matplotlib
!pip install lifelines
import dcp_analysis_functions as dcp
dcp.add_keys('credentials.json')

### Get summary metrics for each data type in the data-model for one project:

In [None]:
dcp.query_summary_counts('topmed-MESA')

### Get summary counts per field across all projects or for specific project:

In [None]:
gender_counts = dcp.query_summary_field('gender', 'demographic')
race_counts = dcp.query_summary_field('race', 'demographic')
age_counts = dcp.query_summary_field('age_range', 'demographic')
ht_counts = dcp.query_summary_field('hypertension_stage', 'diagnosis', 'topmed-MESA')

### Get field distribution for one variable:

Visualize distribution of cholesterol levels in subjects:

In [None]:
distribution = dcp.field_distribution('chol1','lab_result', 'topmed-MESA')

## DATA ANALISIS: TOPMed MESA Data Project

This data analysis shows some preliminary results over TOPMed MESA project data reproduced using the following published studies: 

[1] Jean L. Olson, Diane E. Bild, Richard A. Kronmal, Gregory L. Burke (2016). Legacy of MESA. Global Heart, 11(3), 269-274.
https://www.sciencedirect.com/science/article/pii/S2211816016307098 (Table 1, MESA recruiment study)


[2] Blaha, M. J., Yeboah, J., Al Rifai, M., Liu, K. J., Kronmal, R., & Greenland, P. (2016). The Legacy of MESA – Providing Evidence for Subclinical Cardiovascular Disease in Risk Assessment. Global Heart, 11(3), 275–285.
http://doi.org/10.1016/j.gheart.2016.08.003 (Figure 2, Kaplan-Meier cumulative-event curves for coronary events)

In [None]:
project = 'topmed-MESA'

### Recruitment summary based on demographic variables (gender, race, age range):

Quoting from [1]: "*MESA sampling was to obtain balanced recruitment across strata defined by sex, ethnicity, and age group rather than to represent the demographic distribution*". 

This distribution based on gender, age group and race can be observed in the following line chart:

In [None]:
dcp.demographic_study(project)

### Kaplan-Meier curves for coronary events:

Coronary Artery Calcium (CAC) scores are clearly associated to a graded increase risk of both hard and all CHD events [2]. Thes is shown in the following Kaplan-Meier survival curves based on CAC ranges:

In [None]:
times = dcp.calcium_score_survival('frncep1c', project, "Coronary Artery Calcium (CAC) Kaplan-Meier")