# Example TranSMART API calls for querying sample data
 
This notebook demonstrates the querying features of the transmart package, using an example study that contains patient-level and sample-level data.

The example study is loaded into https://glowingbear.thehyve.net as the `CSR` study.
The source files are available at https://github.com/thehyve/pmc-conversion/tree/master/test_data/test_logic.

## Preparation: loading libraries and connect

Follow these first steps to setup dependencies and the connection.

### Install dependencies

The `transmart` package requires Python 3.x.

```
# Install the transmart package and it's dependencies
pip install transmart[full]
```

### Import packages

In [1]:
import json
import copy
import requests
from getpass import getpass

# python API client
import transmart
from transmart.api.v2.api import Query
from transmart.api.v2.constraints import atomic
print('transmart python client version: {}'.format(transmart.__version__))

transmart python client version: 0.2.6


### Configuration

In [2]:
# Demo environment settings

keycloak_url = 'https://keycloak-dwh-test.thehyve.net'
transmart_url = 'https://transmart.thehyve.net'
keycloak_realm = 'transmart'
keycloak_client_id = 'transmart-client'

### Retrieve offline token for API access

You need to provide your user credentials for https://glowingbear.thehyve.net.
If you do not have an account yet, please visit the website to register.

In [3]:
# User credentials
user = getpass(prompt='username: ')
password = getpass(prompt='password: ')

username: ········
password: ········


In [4]:
# Fetch offline token for API access
r = requests.post(url=f'{keycloak_url}/auth/realms/{keycloak_realm}/protocol/openid-connect/token',
                  data=dict(grant_type='password',
                            client_id=keycloak_client_id,
                            scope= 'offline_access',
                            username=user,
                            password=password
                           )
                 )
if r.status_code != 200:
    print(r.json())
    raise Exception(f'Error: {r.status_code}')
offline_token = r.json().get('refresh_token')
print('Offline token retrieved successfully')

Offline token retrieved successfully


### Connect to the API server

In [5]:
# Create an API object to perform API queries with, using our user credentials

api = transmart.get_api(
    host=transmart_url,
    kc_url=keycloak_url,
    kc_realm=keycloak_realm,
    client_id=keycloak_client_id,
    offline_token=offline_token,
    print_urls=True)

# Common errors:
# * '401 Client Error: Unauthorized' - Wrong username/password
# * 'HTTPSConnectionPool' - Wrong tranSMART or Keycloak URL or no internet
# * '404 Client Error: Not Found' - Wrong Keycloak realm

https://transmart.thehyve.net/v2/studies
https://transmart.thehyve.net/v2/tree_nodes?depth=0&counts=False&tags=True
Existing index cache found. Loaded 10295 tree nodes. Hooray!
https://transmart.thehyve.net/v2/pedigree/relation_types


## Querying the data

### Explore available studies and tree structure

#### Get list of available studies

In [6]:
studies = api.get_studies().dataframe
studies

https://transmart.thehyve.net/v2/studies


Unnamed: 0,bioExperimentId,dimensions,id,secureObjectToken,studyId
0,,"[start time, patient, concept, study]",2,PUBLIC,SYNTHETICMASS
1,,"[Images Id, patient, concept, study]",4,PUBLIC,IMAGES
2,,"[Biosource ID, Diagnosis ID, Biomaterial ID, p...",7,PUBLIC,Tumor Samples
3,,"[Biomaterial, Biosource, start time, Diagnosis...",10,PUBLIC,CSR


#### Get observation and subject counts for a given study 

In [7]:
csr_study_constraint = atomic.StudyConstraint('CSR')
study_counts = api.observations.counts(constraint=csr_study_constraint)
study_counts

https://transmart.thehyve.net/v2/observations/counts


{'observationCount': 341, 'patientCount': 17}

#### Get observation and subject counts for all studies

In [8]:
all_counts = api.observations.counts()
all_counts

https://transmart.thehyve.net/v2/observations/counts


{'observationCount': 19227, 'patientCount': 1488}

#### Get ontology tree

Visualize the tree structure up to _X_ levels deep

In [9]:
tree = api.tree_nodes(depth=5, counts=True)
tree

https://transmart.thehyve.net/v2/tree_nodes?depth=5&counts=True&tags=True


Central Subject Registry  (None)/
  01. Patient information  (None)/
    01. Date of birth  (17)
    02. Taxonomy  (None)
    03. Gender  (17)
    04. Date of death  (3)
    Informed_consent  (None)/
      01. Informed consent type  (17)
      02. Date informed Consent given  (1)
      03. Date informed consent withdrawn  (1)
      04. Informed consent material  (None)
      05. Informed consent data  (None)
      06. Informed consent linking external database  (None)
      07. Report hereditary susceptibility  (None)
      Informed consent version  (None)
  02. Diagnosis information  (None)/
    01. Date of diagnosis  (17)
    02. Tumor type  (17)
    03. Topography  (17)
    04. Tumor stage  (None)
    05. Center of treatment  (17)
    Treatment  (None)
  03. Biosource information  (None)/
    01. Biosource parent  (2)
    02. Date of biosource  (17)
    03. Tissue  (17)
    04. Disease status  (17)
    05. Tumor percentage  (17)
    06. Biosource dedicated for specific study  (17)
 

#### Get subtree

Visualize tree structure only for a certain top node

In [10]:
tree = api.tree_nodes(root='\\Central Subject Registry\\', depth=3, counts=False)
tree

https://transmart.thehyve.net/v2/tree_nodes?root=\Central Subject Registry\&depth=3&counts=False&tags=True


Central Subject Registry  (None)/
  01. Patient information  (None)/
    01. Date of birth  (None)
    02. Taxonomy  (None)
    03. Gender  (None)
    04. Date of death  (None)
    Informed_consent  (None)
  02. Diagnosis information  (None)/
    01. Date of diagnosis  (None)
    02. Tumor type  (None)
    03. Topography  (None)
    04. Tumor stage  (None)
    05. Center of treatment  (None)
    Treatment  (None)
  03. Biosource information  (None)/
    01. Biosource parent  (None)
    02. Date of biosource  (None)
    03. Tissue  (None)
    04. Disease status  (None)
    05. Tumor percentage  (None)
    06. Biosource dedicated for specific study  (None)
  04. Biomaterial information  (None)/
    01. Biomaterial parent  (None)
    02. Date of biomaterial  (None)
    03. Biomaterial type  (None)
  05. Study information  (None)/
    01. Study ID  (None)
    02. Study acronym  (None)
    03. Study title  (None)
    04. Individual Study ID  (None)
    Study datadictionary  (None)

### Obtain list of tree nodes and corresponding concept codes

In [11]:
f1 = ~tree.dataframe['conceptCode'].isna()
f2 = ~tree.dataframe['conceptPath'].isna()
f = f1 & f2
ftree = tree.dataframe[f]
ftree = tree.dataframe[~tree.dataframe['conceptCode'].isna() ]

#display result (sorted by concept path)
concepts = ftree.loc[:, ['conceptPath', 'name', 'metadata.subject_dimension', 'conceptCode']].dropna(how='all').sort_values(by=['conceptPath'])
concepts

Unnamed: 0,conceptPath,name,metadata.subject_dimension,conceptCode
23,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,Biomaterial,Biomaterial.biomaterial_date
22,\CSR\Biomaterial.src_biomaterial_id,01. Biomaterial parent,Biomaterial,Biomaterial.src_biomaterial_id
24,\CSR\Biomaterial.type,03. Biomaterial type,Biomaterial,Biomaterial.type
16,\CSR\Biosource.biosource_date,02. Date of biosource,Biosource,Biosource.biosource_date
20,\CSR\Biosource.biosource_dedicated,06. Biosource dedicated for specific study,Biosource,Biosource.biosource_dedicated
18,\CSR\Biosource.disease_status,04. Disease status,Biosource,Biosource.disease_status
15,\CSR\Biosource.src_biosource_id,01. Biosource parent,Biosource,Biosource.src_biosource_id
17,\CSR\Biosource.tissue,03. Tissue,Biosource,Biosource.tissue
19,\CSR\Biosource.tumor_percentage,05. Tumor percentage,Biosource,Biosource.tumor_percentage
12,\CSR\Diagnosis.center_treatment,05. Center of treatment,Diagnosis,Diagnosis.center_treatment


#### Get available values for a given concept

Get aggregates for a given concept (e.g., value counts for a categorical concept)

In [12]:
tumor_type_concept_code = concepts[concepts['name']=='02. Tumor type']['conceptCode'].unique()[0]
tumor_type_constraint = atomic.ConceptCodeConstraint(tumor_type_concept_code)
aggregates_per_concept = api.observations.aggregates_per_concept(constraint=tumor_type_constraint)
print(json.dumps(aggregates_per_concept, indent=2))

https://transmart.thehyve.net/v2/observations/aggregates_per_concept
{
  "aggregatesPerConcept": {
    "Diagnosis.tumor_type": {
      "categoricalValueAggregates": {
        "nullValueCounts": 0,
        "valueCounts": {
          "Angioimmunoblastic T-cell lymphoma": 7,
          "Malignant lymphoma, non-Hodgkin": 12
        }
      }
    }
  }
}


### Querying observations

#### Query with a concept constraint

Fetch aggregates for the gender concept

In [13]:
# find the concept code for the gender concept
gender_concept_code = concepts[concepts['name']=='03. Gender']['conceptCode'].unique()[0]
# Constraint: all gender observations
gender_constraint = api.new_constraint(concept=gender_concept_code)

print(json.dumps(api.observations.aggregates_per_concept(constraint=gender_constraint), indent=2))
print(api.observations.counts(constraint=gender_constraint))

# Display the first observation
display(api.observations(constraint=gender_constraint).dataframe.head(1))

https://transmart.thehyve.net/v2/observations/aggregates_per_concept
{
  "aggregatesPerConcept": {
    "Individual.gender": {
      "categoricalValueAggregates": {
        "nullValueCounts": 0,
        "valueCounts": {
          "V": 3,
          "female": 3,
          "M": 3,
          "male": 8
        }
      }
    }
  }
}
https://transmart.thehyve.net/v2/observations/counts
{'observationCount': 17, 'patientCount': 17}
https://transmart.thehyve.net/v2/observations


Unnamed: 0,concept.conceptCode,concept.conceptPath,concept.name,patient.age,patient.birthDate,patient.deathDate,patient.id,patient.inTrialId,patient.maritalStatus,patient.race,patient.religion,patient.sex,patient.sexCd,patient.subjectIds.SUBJ_ID,patient.trial,start time,stringValue,study.name
0,Individual.gender,\CSR\Individual.gender,03. Gender,,,,1472,,,,,male,M,PAT1,,,M,CSR


#### Query with concept and value constraint

Fetch observations for malignant lymphoma tumor type

In [14]:
# Constraint: all malignant lymphoma tumor type observations

malignant_lymphoma_constraint = api.new_constraint(concept=tumor_type_concept_code, value_list=['Malignant lymphoma, non-Hodgkin'])

print(api.observations.counts(constraint=malignant_lymphoma_constraint))
display(api.observations(constraint=malignant_lymphoma_constraint).dataframe.head())

https://transmart.thehyve.net/v2/observations/counts
{'observationCount': 12, 'patientCount': 12}
https://transmart.thehyve.net/v2/observations


Unnamed: 0,concept.conceptCode,concept.conceptPath,concept.name,patient.age,patient.birthDate,patient.deathDate,patient.id,patient.inTrialId,patient.maritalStatus,patient.race,patient.religion,patient.sex,patient.sexCd,patient.subjectIds.SUBJ_ID,patient.trial,start time,stringValue,study.name
0,Diagnosis.tumor_type,\CSR\Diagnosis.tumor_type,02. Tumor type,,,,1472,,,,,male,M,PAT1,,,"Malignant lymphoma, non-Hodgkin",CSR
1,Diagnosis.tumor_type,\CSR\Diagnosis.tumor_type,02. Tumor type,,,,1473,,,,,female,female,PAT13,,,"Malignant lymphoma, non-Hodgkin",CSR
2,Diagnosis.tumor_type,\CSR\Diagnosis.tumor_type,02. Tumor type,,,,1478,,,,,male,male,PAT10,,,"Malignant lymphoma, non-Hodgkin",CSR
3,Diagnosis.tumor_type,\CSR\Diagnosis.tumor_type,02. Tumor type,,,,1480,,,,,female,female,PAT12,,,"Malignant lymphoma, non-Hodgkin",CSR
4,Diagnosis.tumor_type,\CSR\Diagnosis.tumor_type,02. Tumor type,,,,1481,,,,,male,M,PAT2,,,"Malignant lymphoma, non-Hodgkin",CSR


### Query patients using observation constraints

#### Select patients using a concept constraint

Select all patients with an observation for the gender concept

In [15]:
# Constraint: all patients with a gender observation
gender_constraint_sub = copy.copy(gender_constraint)
gender_constraint_sub.subselection = 'patient'

print(json.dumps(gender_constraint_sub.json(), indent=2))
print(api.observations.counts(constraint=gender_constraint_sub))

# Display first observation for the selected group of patients
display(api.observations(constraint=gender_constraint_sub).dataframe.head(1))

{
  "type": "subselection",
  "dimension": "patient",
  "constraint": {
    "type": "concept",
    "conceptCode": "Individual.gender"
  }
}
https://transmart.thehyve.net/v2/observations/counts
{'observationCount': 341, 'patientCount': 17}
https://transmart.thehyve.net/v2/observations


Unnamed: 0,Biomaterial,Biosource,Diagnosis,concept.conceptCode,concept.conceptPath,concept.name,numericValue,patient.age,patient.birthDate,patient.deathDate,...,patient.maritalStatus,patient.race,patient.religion,patient.sex,patient.sexCd,patient.subjectIds.SUBJ_ID,patient.trial,start time,stringValue,study.name
0,BIOM1,BIOS1,DIA1,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,M,PAT1,,,2018-03-07,CSR


#### Select patients using concept and value constraint

Select all patients with an observation for malignant lymphoma tumor type

In [16]:
# Constraint: all patients with an observation for malignant lymphoma tumor type
malignant_lymphoma_constraint_sub = copy.copy(malignant_lymphoma_constraint)
malignant_lymphoma_constraint_sub.subselection = 'patient'

print(json.dumps(malignant_lymphoma_constraint_sub.json(), indent=2))
print(api.observations.counts(constraint=malignant_lymphoma_constraint_sub))

# Display first observation for the selected group of patients
display(api.observations(constraint=malignant_lymphoma_constraint_sub).dataframe.head(1))

{
  "type": "subselection",
  "dimension": "patient",
  "constraint": {
    "args": [
      {
        "type": "concept",
        "conceptCode": "Diagnosis.tumor_type"
      },
      {
        "type": "value",
        "valueType": "STRING",
        "operator": "=",
        "value": "Malignant lymphoma, non-Hodgkin"
      }
    ],
    "type": "and"
  }
}
https://transmart.thehyve.net/v2/observations/counts
{'observationCount': 246, 'patientCount': 12}
https://transmart.thehyve.net/v2/observations


Unnamed: 0,Biomaterial,Biosource,Diagnosis,concept.conceptCode,concept.conceptPath,concept.name,numericValue,patient.age,patient.birthDate,patient.deathDate,...,patient.maritalStatus,patient.race,patient.religion,patient.sex,patient.sexCd,patient.subjectIds.SUBJ_ID,patient.trial,start time,stringValue,study.name
0,BIOM1,BIOS1,DIA1,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,M,PAT1,,,2018-03-07,CSR


#### Lookup concepts by path

In [17]:
# find concept by concept_path
gender_concept_path = '\\CSR\\Individual.gender'
gender_concept_code = concepts[concepts['conceptPath']==gender_concept_path]['conceptCode'].unique()[0]
gender_concept_code

'Individual.gender'

In [18]:
gender_constraint = api.new_constraint(concept=gender_concept_code)

print(json.dumps(api.observations.aggregates_per_concept(constraint=gender_constraint), indent=2))
print(api.observations.counts(constraint=gender_constraint))

https://transmart.thehyve.net/v2/observations/aggregates_per_concept
{
  "aggregatesPerConcept": {
    "Individual.gender": {
      "categoricalValueAggregates": {
        "nullValueCounts": 0,
        "valueCounts": {
          "V": 3,
          "female": 3,
          "M": 3,
          "male": 8
        }
      }
    }
  }
}
https://transmart.thehyve.net/v2/observations/counts
{'observationCount': 17, 'patientCount': 17}


### Advanced queries

Combine groups of patients with And (intersection) and Or (union) operators.

In [19]:
# Print all the possible parameters for a query constraint
for key in api.new_constraint().params:
    print("* {}".format(key))

* concept
* study
* trial_visit
* min_value
* max_value
* min_date_value
* max_date_value
* value_list
* min_start_date
* max_start_date
* subject_set_id


In [20]:
# Constraint: select all gender observations with value 'female'
all_females = api.new_constraint(concept=gender_concept_code, value_list=['female'])
# Constraint: select all gender observations with value 'male'
all_males = api.new_constraint(concept=gender_concept_code, value_list=['male'])

### Query for women with a certain tumor type

In [21]:
# Constraint: women with malignant lymphoma
women_with_malignant_lymphoma = all_females & malignant_lymphoma_constraint

# Retrieve and print the counts for observations and patients matching our constraint
output = api.observations.counts(constraint=women_with_malignant_lymphoma)
print(json.dumps(output, indent=2))

# Retrieve the patients matching our constraint and displaying the first five
display(api.patients(constraint=women_with_malignant_lymphoma).dataframe.head())

# Retrieve the observations matching our constraint and displaying the first five
display(api.observations(constraint=women_with_malignant_lymphoma).dataframe.head())

https://transmart.thehyve.net/v2/observations/counts
{
  "observationCount": 40,
  "patientCount": 2
}
https://transmart.thehyve.net/v2/patients


Unnamed: 0,age,birthDate,deathDate,id,inTrialId,maritalStatus,race,religion,sex,subjectIds.SUBJ_ID,trial
0,,,,1480,,,,,FEMALE,PAT12,
1,,,,1473,,,,,FEMALE,PAT13,


https://transmart.thehyve.net/v2/observations


Unnamed: 0,Biomaterial,Biosource,Diagnosis,concept.conceptCode,concept.conceptPath,concept.name,numericValue,patient.age,patient.birthDate,patient.deathDate,...,patient.maritalStatus,patient.race,patient.religion,patient.sex,patient.sexCd,patient.subjectIds.SUBJ_ID,patient.trial,start time,stringValue,study.name
0,BIOM13,BIOS13,DIA13,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,female,female,PAT13,,,2018-03-07,CSR
1,BIOM12,BIOS12,DIA12,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,female,female,PAT12,,,2011-06-05,CSR
2,BIOM21,BIOS12,DIA12,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,female,female,PAT12,,,2011-06-05,CSR
3,BIOM21,BIOS12,DIA12,Biomaterial.src_biomaterial_id,\CSR\Biomaterial.src_biomaterial_id,01. Biomaterial parent,,,,,...,,,,female,female,PAT12,,,BIOM12,CSR
4,BIOM13,BIOS13,DIA13,Biomaterial.type,\CSR\Biomaterial.type,03. Biomaterial type,,,,,...,,,,female,female,PAT13,,,mRNA,CSR


### Query for men with a certain tumor type

In [22]:
# Constraint: men with malignant lymphoma
men_with_malignant_lymphoma = all_males & malignant_lymphoma_constraint

# Retrieve and print the counts for observations and patients matching our constraint
output = api.observations.counts(constraint=men_with_malignant_lymphoma)
print(json.dumps(output, indent=2))

# Retrieve the patients matching our constraint and displaying the first five
display(api.patients(constraint=men_with_malignant_lymphoma).dataframe.head())

# Retrieve the observations matching our constraint and displaying the first five
display(api.observations(constraint=men_with_malignant_lymphoma).dataframe.head())

https://transmart.thehyve.net/v2/observations/counts
{
  "observationCount": 127,
  "patientCount": 7
}
https://transmart.thehyve.net/v2/patients


Unnamed: 0,age,birthDate,deathDate,id,inTrialId,maritalStatus,race,religion,sex,subjectIds.SUBJ_ID,trial
0,,,,1487,,,,,MALE,PAT8,
1,,,,1478,,,,,MALE,PAT10,
2,,,,1486,,,,,MALE,PAT7,
3,,,,1483,,,,,MALE,PAT4,
4,,,,1485,,,,,MALE,PAT6,


https://transmart.thehyve.net/v2/observations


Unnamed: 0,Biomaterial,Biosource,Diagnosis,concept.conceptCode,concept.conceptPath,concept.name,numericValue,patient.age,patient.birthDate,patient.deathDate,...,patient.maritalStatus,patient.race,patient.religion,patient.sex,patient.sexCd,patient.subjectIds.SUBJ_ID,patient.trial,start time,stringValue,study.name
0,BIOM10,BIOS10,DIA10,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,male,PAT10,,,2018-03-07,CSR
1,BIOM4,BIOS4,DIA4,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,male,PAT4,,,2018-03-07,CSR
2,BIOM5,BIOS5,DIA5,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,male,PAT5,,,2011-06-05,CSR
3,BIOM6,BIOS6,DIA6,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,male,PAT6,,,2011-06-05,CSR
4,BIOM7,BIOS7,DIA7,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,male,PAT7,,,2018-03-07,CSR


### Combining groups with And (intersection)

Create a constraint that selects individuals that are both in the male and female group.
The result should be 0 patients/observations.

In [23]:
# Constraint: the intersection of men with malignant lymphoma and women with malignant lymphoma
both_male_and_female_with_malignant_lymphoma = women_with_malignant_lymphoma & copy.copy(men_with_malignant_lymphoma)

# Retrieve and print the counts for observations and patients matching our constraint
output = api.observations.counts(constraint=both_male_and_female_with_malignant_lymphoma)
print(json.dumps(output, indent=2))

https://transmart.thehyve.net/v2/observations/counts
{
  "observationCount": 0,
  "patientCount": 0
}


### Combining groups with Or (union)

Create a constraint that selects the union of all men, all women, and all patients diagnosed with malignant lymphoma.
The result should include all patients/observations, because logic implies inclusion of patients that are male or female or have tumor.

In [24]:
# Constraint: the union of men, women and patients with malignant lymphoma
male_or_female_or_malignant_lymphoma = (all_females | all_males) | malignant_lymphoma_constraint

# Retrieve and print the counts for observations and patients matching our constraint
output = api.observations.counts(constraint=male_or_female_or_malignant_lymphoma)
print(json.dumps(output, indent=2))

# Retrieve the patients matching our constraint and displaying the first five
display(api.patients(constraint=male_or_female_or_malignant_lymphoma).dataframe.head())

# Retrieve the observations matching our constraint and displaying the first five
display(api.observations(constraint=male_or_female_or_malignant_lymphoma).dataframe.head())

https://transmart.thehyve.net/v2/observations/counts
{
  "observationCount": 287,
  "patientCount": 14
}
https://transmart.thehyve.net/v2/patients


Unnamed: 0,age,birthDate,deathDate,id,inTrialId,maritalStatus,race,religion,sex,subjectIds.SUBJ_ID,trial
0,,,,1472,,,,,MALE,PAT1,
1,,,,1473,,,,,FEMALE,PAT13,
2,,,,1474,,,,,FEMALE,PAT14,
3,,,,1478,,,,,MALE,PAT10,
4,,,,1479,,,,,MALE,PAT11,


https://transmart.thehyve.net/v2/observations


Unnamed: 0,Biomaterial,Biosource,Diagnosis,concept.conceptCode,concept.conceptPath,concept.name,numericValue,patient.age,patient.birthDate,patient.deathDate,...,patient.maritalStatus,patient.race,patient.religion,patient.sex,patient.sexCd,patient.subjectIds.SUBJ_ID,patient.trial,start time,stringValue,study.name
0,BIOM1,BIOS1,DIA1,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,M,PAT1,,,2018-03-07,CSR
1,BIOM18,BIOS18,DIA18,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,M,PAT1,,,2011-06-05,CSR
2,BIOM13,BIOS13,DIA13,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,female,female,PAT13,,,2018-03-07,CSR
3,BIOM14,BIOS14,DIA14,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,female,female,PAT14,,,2011-06-05,CSR
4,BIOM10,BIOS10,DIA10,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,male,PAT10,,,2018-03-07,CSR


### Combine And with Or (intersection and union)

Create a constraints that selects both women with malignant lymphoma and men with malignant lymphoma.

In [25]:
# Combine OR with AND: both women with a particular tumor type and men with the tumor type
all_with_malignant_lymphoma = women_with_malignant_lymphoma | men_with_malignant_lymphoma

# Print a representation of the constraint
#print(json.dumps(all_with_malignant_lymphoma.json(), indent=2))

# Retrieve and print the counts for observations and patients matching the constraint
output = api.observations.counts(constraint=all_with_malignant_lymphoma)
print(json.dumps(output, indent=2))

# Retrieve the patients matching the constraint and displaying the first five
display(api.patients(constraint=all_with_malignant_lymphoma).dataframe.head())

# Retrieve the observations matching the constraint and displaying the first five
display(api.observations(constraint=all_with_malignant_lymphoma).dataframe.head())

https://transmart.thehyve.net/v2/observations/counts
{
  "observationCount": 167,
  "patientCount": 9
}
https://transmart.thehyve.net/v2/patients


Unnamed: 0,age,birthDate,deathDate,id,inTrialId,maritalStatus,race,religion,sex,subjectIds.SUBJ_ID,trial
0,,,,1473,,,,,FEMALE,PAT13,
1,,,,1478,,,,,MALE,PAT10,
2,,,,1480,,,,,FEMALE,PAT12,
3,,,,1483,,,,,MALE,PAT4,
4,,,,1484,,,,,MALE,PAT5,


https://transmart.thehyve.net/v2/observations


Unnamed: 0,Biomaterial,Biosource,Diagnosis,concept.conceptCode,concept.conceptPath,concept.name,numericValue,patient.age,patient.birthDate,patient.deathDate,...,patient.maritalStatus,patient.race,patient.religion,patient.sex,patient.sexCd,patient.subjectIds.SUBJ_ID,patient.trial,start time,stringValue,study.name
0,BIOM13,BIOS13,DIA13,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,female,female,PAT13,,,2018-03-07,CSR
1,BIOM10,BIOS10,DIA10,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,male,PAT10,,,2018-03-07,CSR
2,BIOM12,BIOS12,DIA12,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,female,female,PAT12,,,2011-06-05,CSR
3,BIOM21,BIOS12,DIA12,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,female,female,PAT12,,,2011-06-05,CSR
4,BIOM4,BIOS4,DIA4,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,,...,,,,male,male,PAT4,,,2018-03-07,CSR


### Specifying custom constraints in JSON format

Use subselection to select biomaterials related to the eye tissue biosources
(this leaves out the observations on patient, diagnosis and biosource level).

In [26]:
tissue_concept_code = concepts[concepts['name']=='03. Tissue']['conceptCode'].unique()[0]
tissue_constraint = api.new_constraint(concept=tissue_concept_code)

print('Tissue types:', api.observations(constraint=tissue_constraint).dataframe.stringValue.unique())
eye_tissue_constraint = api.new_constraint(concept=tissue_concept_code, value_list = ['eye'])
eye_tissue_constraint.subselection = 'Biosource'

# Retrieve and print the counts for observations and patients for eye tissue biosources
output = api.observations.counts(constraint=eye_tissue_constraint)
print(json.dumps(output, indent=2))

print(json.dumps(eye_tissue_constraint.json(), indent=2))

# Use subselection to select biomaterials related to the eye tissue biosources
# (this leaves out the observations on patient, diagnosis and biosource level)
# Note that this is a pure Python dictionary, not a constraint class from the transmart library.
eye_tissue_biomaterials_constraint = {
    'type': 'subselection',
    'dimension': 'Biomaterial',
    'constraint': eye_tissue_constraint.json()
}
output = api.observations.counts(constraint=eye_tissue_biomaterials_constraint)
print(json.dumps(output, indent=2))


display(api.observations(constraint=eye_tissue_biomaterials_constraint).dataframe.head())

https://transmart.thehyve.net/v2/observations
Tissue types: ['eye' 'nerve']
https://transmart.thehyve.net/v2/observations/counts
{
  "observationCount": 62,
  "patientCount": 8
}
{
  "type": "subselection",
  "dimension": "Biosource",
  "constraint": {
    "args": [
      {
        "type": "concept",
        "conceptCode": "Biosource.tissue"
      },
      {
        "type": "value",
        "valueType": "STRING",
        "operator": "=",
        "value": "eye"
      }
    ],
    "type": "and"
  }
}
https://transmart.thehyve.net/v2/observations/counts
{
  "observationCount": 22,
  "patientCount": 8
}
https://transmart.thehyve.net/v2/observations


Unnamed: 0,Biomaterial,Biosource,Diagnosis,concept.conceptCode,concept.conceptPath,concept.name,patient.age,patient.birthDate,patient.deathDate,patient.id,...,patient.maritalStatus,patient.race,patient.religion,patient.sex,patient.sexCd,patient.subjectIds.SUBJ_ID,patient.trial,start time,stringValue,study.name
0,BIOM1,BIOS1,DIA1,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,1472,...,,,,male,M,PAT1,,,2018-03-07,CSR
1,BIOM10,BIOS10,DIA10,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,1478,...,,,,male,male,PAT10,,,2018-03-07,CSR
2,BIOM11,BIOS11,DIA11,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,1479,...,,,,male,male,PAT11,,,2011-06-05,CSR
3,BIOM20,BIOS11,DIA11,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,1479,...,,,,male,male,PAT11,,,2011-06-05,CSR
4,BIOM12,BIOS12,DIA12,Biomaterial.biomaterial_date,\CSR\Biomaterial.biomaterial_date,02. Date of biomaterial,,,,1480,...,,,,female,female,PAT12,,,2011-06-05,CSR
