<img style="float: right;" src="files/resources/general/thehyve_logo.png">
# TranSMART 17.1 REST API demonstration
---------------
Copyright (c) 2017 The Hyve B.V. This notebook is licensed under the GNU General Public License, version 3. Authors: 
 - Ward Weistra
 - Jochem Bijlard


We start by importing the tranSMART Python library (https://pypi.python.org/pypi/transmart) and connecting to the tranSMART server.

In [None]:
import transmart as tm
print('transmart python client version: {}'.format(tm.__version__))

In [None]:
# Instead of using your own account to login, we setup so can use "demo-user" as username and password

user = password = 'demo-user'

api = tm.TransmartApi(
    host = 'http://transmart-test.thehyve.net',
    user = user,
    password = password,
    api_version = 2,
    print_urls = True)

api.access()

Next we import and configure Pandas, a Python library that helps us to work with the data. One of main concepts it has borrowed from `R`, you can do most of your data manipulation using dataframes.

For future versions we would like to incorporate this and integration with Jupyter itself more closely into the Python client, but for now we will do some of the dirty work manually.

In [None]:
import pandas as pd
from pandas.io.json import json_normalize

pd.set_option('max_colwidth', 1000)
pd.set_option("display.max_rows", 100)
pd.set_option("display.max_columns", 100)

## Whats is in the box?

As a first REST API call it would be nice to see what studies are available in this tranSMART server.
  
You will see a list of all studies, their name (i.e. `studyId`) and what dimensions are available for this study. Remember that tranSMART previously only supported the dimensions patients, concepts and studies. Now you should see studies with many more dimensions! 

**NOTE: The API shows you what handles it is connecting to!**

In [None]:
studies = api.get_studies()
studies

## Part 1: Plotting blood pressure over time
To answer a real question that uses the REST api and the new time dimension we will create plots of a sample dataset with blood pressure measured at multiple time points. To explore what we have in this project we will:
 
 1. have a look at the patients
 1. have a look at observations
 1. create a subset of data we want to plot
 1. create aggregated plots


For this tutorial we will be using a preloaded project with the studyID `TRAINING`.

In [None]:
STUDY_ID = 'TRAINING'

### 1.1 Getting the patients for this study
We choose the TRAINING study and ask for all patients in this study using the `get_patients()` function. You will get a list with their patient details and patient identifiers. The variables you see (e.g. `Age`, `Sex`, and `Race`) historically have a special place in tranSMART and are often stored as both and observation and in this patient relationship table.

In [None]:
patients = api.get_patients(study=STUDY_ID)
patients

### 1.2 Getting the observations

Next we ask for the full list of observations for this study. This list will include one row per observation, with information from all their dimensions. The columns will have headers like `<dimension name>.<field name>` and `numericValue` or `stringValue` for the actual observation value.

In [None]:
observations = api.get_observations(study=STUDY_ID)
observations

In [None]:
# A quick overview of the columns in the returned dataframe.
print('The columns in our dataframe are:')
for index, column in enumerate(observations.columns):
    print(' {:>5} {}'.format(index, column))

In [None]:
# And a list with the available concepts in this dataset:
available_concepts = observations.loc[:, 'concept.conceptPath'].unique()

print('Available concepts for this study:')
for index, concept in enumerate(sorted(available_concepts)):
    print('{:>5} {}'.format(index, concept))

In [None]:
observations[:10]

### 1.3 Subsetting on the data we want

For this exercise we are primarily interested in subject blood pressure. So lets select only data from our set that we want to use.

In [None]:
concept_groups = observations.groupby('concept.conceptPath')
blood_pressure_observations = concept_groups.get_group('\\Public Studies\\Training\\Measurements\\Blood pressure\\')
blood_pressure_observations[:10]

In [None]:
# Get the columns I want to use and give them better names.
columns_of_interest = ['patient.inTrialId', 'trial visit.relTimeLabel', 'numericValue']
blood_pressure_subset = blood_pressure_observations.loc[:, columns_of_interest]
blood_pressure_subset.columns = ['subject', 'visit_label', 'blood_pressure']
blood_pressure_subset

### 1.4 So lets create some plots!!

First import our plotting library and tell Jupyter to directly show the images we create using `matplotlib`.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

TranSMARTs new data model allows for an arbirary number of observations per concept and/or trial visit. As you perhaps noticed in the previous step, in our example blood measure been measured twice per patient for each visit. In the plot we want to create we want to use the mean of these two values.

In [None]:
bp_pivot = blood_pressure_subset.pivot_table(index='visit_label', 
                                             values='blood_pressure', 
                                             columns='subject', 
                                             aggfunc='mean')
bp_pivot;

In [None]:
subject_plot = bp_pivot.plot(legend=False, 
                             figsize=(12, 5),
                             title='Mean blood pressure per subject')

### 1.4b Masking with treatment

Included in our example project is a treatment which we suspect lowers blood pressure. Let's include that in our analysis by adding it to our dataframe and creating a pivot based on that.

In [None]:
control_concept = '\\Public Studies\\Training\\Study Design\\Group\\Control\\'
treated_concept = '\\Public Studies\\Training\\Study Design\\Group\\Treatment\\'

control_group = list(concept_groups.get_group(control_concept).loc[:, 'patient.inTrialId'])
treated_group = list(concept_groups.get_group(treated_concept).loc[:, 'patient.inTrialId'])

In [None]:
def control_or_treated(observation):
    if observation.subject in control_group:
        return 'Control'
    if observation.subject in treated_group:
        return 'Treated'

blood_pressure_subset['treatment_group'] = blood_pressure_subset.apply(control_or_treated, axis=1)

In [None]:
treatment_pivot = blood_pressure_subset.pivot_table(index='visit_label', 
                                                    values='blood_pressure', 
                                                    columns='treatment_group', 
                                                    aggfunc='mean')

In [None]:
treatment_plot = treatment_pivot.plot(kind='bar',
                                      figsize=(12, 5),
                                      title='Mean blood pressure per treatment group')

## Part 2: Combining Glowing Bear and the Python client

For the second part we will work with the Glowing Bear user interface that was developed at The Hyve, funded by IMI Translocation and BBMRI.

An API is great to extract exactly the data you need and analyze that. But it is harder to get a nice overview of all data that is available and define the exact set to extract. That is where the Glowing Bear was built for.

Please go to http://glowingbear2-head.thehyve.net and create a Patient Set on the Data Selection tab (under Select patients). Once you have saved your patient set, copy the patient set identifier and paste that below.

In [None]:
patient_set_id = 28758

Now let's return all patients for the patient set we made!

In [None]:
patients = api.get_patients(patientSet = patient_set_id)
patients

And do the same for all observations for this patient set.

In [None]:
observations = api.get_observations(study = STUDY_ID, patientSet = patient_set_id)
observations

Now you know exactly how to retrieve data from a tranSMART 2017 server and analyze this with Python! Please feel free to change this code in anyway you like. And if you have any questions, reach us at our public forum via development@thehyve.nl or https://groups.google.com/a/thehyve.nl/forum/#!forum/development.