# T030 · Compound data acquisition (GtoPDB)

**Note:** This talktorial is a part of TeachOpenCADD, a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects.

Authors:

- Dominique Sydow, 2022, [Volkamer lab, Charité](https://volkamerlab.org/)

## Aim of this talktorial

Add a short summary of this talktorial's content.

### Contents in *Theory*

_Add Table of Contents (TOC) for Theory section._

* ChEMBL database
* Compound activity measures

<div class="alert alert-block alert-info">

<b>Sync TOC with section titles</b>: These points should refer to the headlines of your <i>Theory</i> section.

</div>

### Contents in *Practical*

_Add Table of Contents (TOC) for Practical section._

* Connect to ChEMBL database
* Load and draw molecules

<div class="alert alert-block alert-info">

<b>Sync TOC with section titles</b>: These points should refer to the headlines of your <i>Practical</i> section.

</div>

### References

* Paper 
* Tutorial links
* Other useful resources

*We suggest the following citation style:*
* Keyword describing resource: <i>Journal</i> (year), <b>volume</b>, pages (link to resource) 

*Example:*
* ChEMBL web services: [<i>Nucleic Acids Res.</i> (2015), <b>43</b>, 612-620](https://academic.oup.com/nar/article/43/W1/W612/2467881) 

## Theory

### Guide to Pharmacology (GtoPDB) database

- Website: https://www.guidetopharmacology.org/
- Downloads: https://www.guidetopharmacology.org/download.jsp
- Web services: https://www.guidetopharmacology.org/webServices.jsp

## Practical

In [1]:
import requests
import pandas as pd

In [2]:
pd.set_option("display.max_columns", 50)

### Targets and families

In [3]:
gtopdb_targets = pd.read_csv(
    "https://www.guidetopharmacology.org/DATA/targets_and_families.csv", skiprows=1
)

In [4]:
print(gtopdb_targets.shape)
gtopdb_targets.head()

(3226, 37)


Unnamed: 0,Type,Family id,Family name,Target id,Target name,Subunit id,Subunit name,Target systematic name,Target abbreviated name,synonyms,HGNC id,HGNC symbol,HGNC name,Human genetic localisation,Human nucleotide RefSeq,Human protein RefSeq,Human SwissProt,Human Entrez Gene,Human Ensembl Gene,RGD id,RGD symbol,RGD name,Rat genetic localisation,Rat nucleotide RefSeq,Rat protein RefSeq,Rat SwissProt,Rat Entrez Gene,Rat Ensembl Gene,MGI id,MGI symbol,MGI name,Mouse genetic localisation,Mouse nucleotide RefSeq,Mouse protein RefSeq,Mouse SwissProt,Mouse Entrez Gene,Mouse Ensembl Gene
0,gpcr,1,5-Hydroxytryptamine receptors,1,5-HT<sub>1A</sub> receptor,,,,,ADRBRL1|5-HT1A|ADRB2RL1|serotonin receptor 1A|...,5286,HTR1A,5-hydroxytryptamine receptor 1A,5q12.3,NM_000524,NP_000515,P08908,3350,ENSG00000178394,2845.0,Htr1a,5-hydroxytryptamine receptor 1A,2q13,NM_012585,NP_036717,P19327,24473.0,ENSRNOG00000010254,MGI:96273,Htr1a,5-hydroxytryptamine (serotonin) receptor 1A,13 56.92 cM,NM_008308,NP_032334,Q64264,15550.0,ENSMUSG00000021721
1,gpcr,1,5-Hydroxytryptamine receptors,2,5-HT<sub>1B</sub> receptor,,,,,5-HT1B|5-HT1DB|HTR1D2|5-HT1B serotonin recepto...,5287,HTR1B,5-hydroxytryptamine receptor 1B,6q14.1,NM_000863,NP_000854,P28222,3351,ENSG00000135312,2846.0,Htr1b,5-hydroxytryptamine receptor 1B,8q31,NM_022225,NP_071561,P28564,25075.0,ENSRNOG00000013042,MGI:96274,Htr1b,5-hydroxytryptamine (serotonin) receptor 1B,9 44.61 cM,NM_010482,NP_034612,P28334,15551.0,ENSMUSG00000049511
2,gpcr,1,5-Hydroxytryptamine receptors,3,5-HT<sub>1D</sub> receptor,,,,,5-HT<sub>1D&alpha;</sub>|HTRL|5-HT1D|HT1DA|ser...,5289,HTR1D,5-hydroxytryptamine receptor 1D,1p36.12,NM_000864,NP_000855,P28221,3352,ENSG00000179546,2847.0,Htr1d,5-hydroxytryptamine receptor 1D,5q36,NM_012852,NP_036984,P28565,25323.0,ENSRNOG00000012038,MGI:96276,Htr1d,5-hydroxytryptamine (serotonin) receptor 1D,4 68.74 cM,NM_008309,NP_032335,Q61224,15552.0,ENSMUSG00000070687
3,gpcr,1,5-Hydroxytryptamine receptors,4,5-ht<sub>1e</sub> receptor,,,,,5-HT<sub>1E</sub><sub>&alpha;</sub>|5-HT1E|5-h...,5291,HTR1E,5-hydroxytryptamine receptor 1E,6q14.3,NM_000865,NP_000856,P28566,3354,ENSG00000168830,,,,,,,,,,,,,,,,,,
4,gpcr,1,5-Hydroxytryptamine receptors,5,5-HT<sub>1F</sub> receptor,,,,,5-HT<sub>1E&beta;</sub>|5-HT<sub>6</sub>|5-HT1...,5292,HTR1F,5-hydroxytryptamine receptor 1F,3p12,NM_000866,NP_000857,P30939,3355,ENSG00000179097,71083.0,Htr1f,5-hydroxytryptamine receptor 1F,11p12,NM_021857,NP_068629,P30940,60448.0,ENSRNOG00000000716,MGI:99842,Htr1f,5-hydroxytryptamine (serotonin) receptor 1F,16 37.1 cM,NM_008310,NP_032336,Q02284,15557.0,ENSMUSG00000050783


In [5]:
print(gtopdb_targets.columns)

Index(['Type', 'Family id', 'Family name', 'Target id', 'Target name',
       'Subunit id', 'Subunit name', 'Target systematic name',
       'Target abbreviated name', 'synonyms', 'HGNC id', 'HGNC symbol',
       'HGNC name', 'Human genetic localisation', 'Human nucleotide RefSeq',
       'Human protein RefSeq', 'Human SwissProt', 'Human Entrez Gene',
       'Human Ensembl Gene', 'RGD id', 'RGD symbol', 'RGD name',
       'Rat genetic localisation', 'Rat nucleotide RefSeq',
       'Rat protein RefSeq', 'Rat SwissProt', 'Rat Entrez Gene',
       'Rat Ensembl Gene', 'MGI id', 'MGI symbol', 'MGI name',
       'Mouse genetic localisation', 'Mouse nucleotide RefSeq',
       'Mouse protein RefSeq', 'Mouse SwissProt', 'Mouse Entrez Gene',
       'Mouse Ensembl Gene'],
      dtype='object')


In [6]:
# Show only human-relevant columns
columns_human = [
    i for i in gtopdb_targets.columns if not any([string in i for string in ["Rat", "Mouse"]])
]
gtopdb_targets = gtopdb_targets[columns_human]
gtopdb_targets

Unnamed: 0,Type,Family id,Family name,Target id,Target name,Subunit id,Subunit name,Target systematic name,Target abbreviated name,synonyms,HGNC id,HGNC symbol,HGNC name,Human genetic localisation,Human nucleotide RefSeq,Human protein RefSeq,Human SwissProt,Human Entrez Gene,Human Ensembl Gene,RGD id,RGD symbol,RGD name,MGI id,MGI symbol,MGI name
0,gpcr,1,5-Hydroxytryptamine receptors,1,5-HT<sub>1A</sub> receptor,,,,,ADRBRL1|5-HT1A|ADRB2RL1|serotonin receptor 1A|...,5286,HTR1A,5-hydroxytryptamine receptor 1A,5q12.3,NM_000524,NP_000515,P08908,3350,ENSG00000178394,2845,Htr1a,5-hydroxytryptamine receptor 1A,MGI:96273,Htr1a,5-hydroxytryptamine (serotonin) receptor 1A
1,gpcr,1,5-Hydroxytryptamine receptors,2,5-HT<sub>1B</sub> receptor,,,,,5-HT1B|5-HT1DB|HTR1D2|5-HT1B serotonin recepto...,5287,HTR1B,5-hydroxytryptamine receptor 1B,6q14.1,NM_000863,NP_000854,P28222,3351,ENSG00000135312,2846,Htr1b,5-hydroxytryptamine receptor 1B,MGI:96274,Htr1b,5-hydroxytryptamine (serotonin) receptor 1B
2,gpcr,1,5-Hydroxytryptamine receptors,3,5-HT<sub>1D</sub> receptor,,,,,5-HT<sub>1D&alpha;</sub>|HTRL|5-HT1D|HT1DA|ser...,5289,HTR1D,5-hydroxytryptamine receptor 1D,1p36.12,NM_000864,NP_000855,P28221,3352,ENSG00000179546,2847,Htr1d,5-hydroxytryptamine receptor 1D,MGI:96276,Htr1d,5-hydroxytryptamine (serotonin) receptor 1D
3,gpcr,1,5-Hydroxytryptamine receptors,4,5-ht<sub>1e</sub> receptor,,,,,5-HT<sub>1E</sub><sub>&alpha;</sub>|5-HT1E|5-h...,5291,HTR1E,5-hydroxytryptamine receptor 1E,6q14.3,NM_000865,NP_000856,P28566,3354,ENSG00000168830,,,,,,
4,gpcr,1,5-Hydroxytryptamine receptors,5,5-HT<sub>1F</sub> receptor,,,,,5-HT<sub>1E&beta;</sub>|5-HT<sub>6</sub>|5-HT1...,5292,HTR1F,5-hydroxytryptamine receptor 1F,3p12,NM_000866,NP_000857,P30939,3355,ENSG00000179097,71083,Htr1f,5-hydroxytryptamine receptor 1F,MGI:99842,Htr1f,5-hydroxytryptamine (serotonin) receptor 1F
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3221,other_protein,904,Tumour-associated antigens,2959,glypican 3,,,,,,4451,GPC3,glypican 3,Xq26.2,NM_001164617,NP_001158089,P51654,2719,ENSG00000147257,2725,Gpc3,glypican 3,MGI:104903,Gpc3,glypican 3
3222,other_protein,904,Tumour-associated antigens,3185,MAGE family member A3,,,,,"antigen MZ2-D|cancer/testis antigen family 1, ...",6801,MAGEA3,MAGE family member A3,Xq28,NM_005362,NP_005353,P43357,4102,ENSG00000221867,,,,,,
3223,other_protein,904,Tumour-associated antigens,3009,trophoblast glycoprotein,,,,,5T4|5T4-Ag|5T4 oncofetal antigen|Wnt-activated...,12004,TPBG,trophoblast glycoprotein,6q14.1,NM_006670,NP_006661,Q13641,7162,ENSG00000146242,621453,Tpbg,trophoblast glycoprotein,MGI:1341264,Tpbg,trophoblast glycoprotein
3224,other_protein,904,Tumour-associated antigens,2837,tumor associated calcium signal transducer 2,,,,,M1S1|EGP-1|TROP2|TROP-2|GA733-1|RS7 antigen|tu...,11530,TACSTD2,tumor associated calcium signal transducer 2,1p32.1,NM_002353,NP_002344,P09758,4070,ENSG00000184292,1359498,Tacstd2,tumor-associated calcium signal transducer 2,MGI:1861606,Tacstd2,tumor-associated calcium signal transducer 2


In [7]:
gtopdb_targets.groupby("Type").size().sort_values(ascending=False)

Type
enzyme                1313
transporter            554
gpcr                   411
catalytic_receptor     314
other_protein          299
vgic                   145
lgic                    86
other_ic                55
nhr                     49
dtype: int64

In [8]:
gtopdb_targets[gtopdb_targets["Type"] == "enzyme"]

Unnamed: 0,Type,Family id,Family name,Target id,Target name,Subunit id,Subunit name,Target systematic name,Target abbreviated name,synonyms,HGNC id,HGNC symbol,HGNC name,Human genetic localisation,Human nucleotide RefSeq,Human protein RefSeq,Human SwissProt,Human Entrez Gene,Human Ensembl Gene,RGD id,RGD symbol,RGD name,MGI id,MGI symbol,MGI name
1060,enzyme,922,1.1.1.42 Isocitrate dehydrogenases,2884,isocitrate dehydrogenase (NADP(+)) 1,,,IDH1,,isocitrate dehydrogenase 1 (NADP)|isocitrate d...,5382,IDH1,isocitrate dehydrogenase (NADP(+)) 1,2q34,NM_005896,NP_005887,O75874,3417,ENSG00000138413,2862,Idh1,isocitrate dehydrogenase (NADP(+)) 1,MGI:96413,Idh1,"isocitrate dehydrogenase 1 (NADP+), soluble"
1061,enzyme,922,1.1.1.42 Isocitrate dehydrogenases,2885,isocitrate dehydrogenase (NADP(+)) 2,,,IDH2,,IDH-2|IDPm|isocitrate dehydrogenase 2 (NADP+)|...,5383,IDH2,isocitrate dehydrogenase (NADP(+)) 2,15q26.1,NM_002168,NP_002159,P48735,3418,ENSG00000182054,1597139,Idh2,isocitrate dehydrogenase (NADP(+)) 2,MGI:96414,Idh2,"isocitrate dehydrogenase 2 (NADP+), mitochondrial"
1062,enzyme,899,1.13.11.- Dioxygenases,2829,"indoleamine 2,3-dioxygenase 1",,,,IDO1,"INDO|indoleamine-pyrrole 2,3 dioxygenase|IDO-1...",6059,IDO1,"indoleamine 2,3-dioxygenase 1",8p11.21,NM_002164,NP_002155,P14902,3620,ENSG00000131203,619989,Ido1,"indoleamine 2,3-dioxygenase 1",MGI:96416,Ido1,"indoleamine 2,3-dioxygenase 1"
1063,enzyme,899,1.13.11.- Dioxygenases,3019,"indoleamine 2,3-dioxygenase 2",,,,IDO2,"indoleamine 2|INDOL1|indoleamine-pyrrole 2,3 d...",27269,IDO2,"indoleamine 2,3-dioxygenase 2",8p11.21,NM_194294,NP_919270,Q6ZQW0,169355,ENSG00000188676,1596771,Ido2,"indoleamine 2,3-dioxygenase 2",MGI:2142489,Ido2,"indoleamine 2,3-dioxygenase 2"
1064,enzyme,899,1.13.11.- Dioxygenases,2887,"tryptophan 2,3-dioxygenase",,,,TDO2,tryptophan peroxidase|tryptophan pyrrolase|try...,11708,TDO2,"tryptophan 2,3-dioxygenase",4q32.1,NM_005651|NM_019911,NP_005642,P48775,6999,ENSG00000151790,68370,Tdo2,"tryptophan 2,3-dioxygenase",MGI:1928486,Tdo2,"tryptophan 2,3-dioxygenase"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2368,enzyme,644,YANK family,1538,serine/threonine kinase 32B,,,,YANK2,STK32|STKG6,14217,STK32B,serine/threonine kinase 32B,4p16.2,NM_018401,NP_060871,Q9NY57,55351,ENSG00000152953,1306173,Stk32b,serine/threonine kinase 32B,MGI:1927552,Stk32b,serine/threonine kinase 32B
2369,enzyme,644,YANK family,1539,serine/threonine kinase 32C,,,,YANK3,PKE|Pkek,21332,STK32C,serine/threonine kinase 32C,10q26.3,NM_173575,NP_775846,Q86UX6,282974,ENSG00000165752,1305864,Stk32c,serine/threonine kinase 32C,MGI:2385336,Stk32c,serine/threonine kinase 32C
2370,enzyme,552,YSK subfamily,2217,serine/threonine kinase 24,,,,MST3,mammalian STE20-like protein kinase 3|MST3B|STK3,11403,STK24,serine/threonine kinase 24,13q32.2,NM_001032296,NP_001027467,Q9Y6E0,8428,ENSG00000102572,1561742,Stk24,serine/threonine kinase 24,MGI:2385007,Stk24,serine/threonine kinase 24
2371,enzyme,552,YSK subfamily,2218,serine/threonine kinase 25,,,,YSK1,SOK1,11404,STK25,serine/threonine kinase 25,2q37.3,NM_006374,NP_006365,O00506,10494,ENSG00000115694,727809,Stk25,serine/threonine kinase 25,MGI:1891699,Stk25,serine/threonine kinase 25 (yeast)


### Bioactivity space of query target

In [9]:
uniprot_id = "P00533"

In [10]:
gtopdb_target_id = gtopdb_targets[gtopdb_targets["Human SwissProt"] == uniprot_id][
    "Target id"
].iloc[0]
gtopdb_target_id

1797

In [12]:
url = f"https://www.guidetopharmacology.org/services/targets/{gtopdb_target_id}/interactions"
response = requests.get(url)
data = response.json()
data = pd.DataFrame(data)
data

Unnamed: 0,interactionId,targetId,ligandAsTargetId,targetSpecies,primaryTarget,targetBindingSite,ligandId,ligandContext,endogenous,type,action,actionComment,selectivity,concentrationRange,affinity,affinityParameter,originalAffinity,originalAffinityType,originalAffinityRelation,assayDescription,assayConditions,useDependent,voltageDependent,voltage,physiologicalVoltage,conciseView,refs
0,85854,1797,0,Human,False,,10194,,False,Inhibitor,Inhibition,,Not Determined,,7.5,pIC50,3x10<sup>-8</sup>,IC50,,,,False,False,,False,False,"[{'referenceId': 36725, 'pmid': 30633509, 'typ..."
1,85713,1797,0,Human,False,,10136,,False,Inhibitor,Inhibition,,Not Determined,,7.7,pIC50,2x10<sup>-8</sup>,IC50,<,Measuring inhibition of mutant EGFR kinase act...,,False,False,,False,False,"[{'referenceId': 36378, 'pmid': None, 'type': ..."
2,81247,1797,0,Human,True,,5675,,False,Inhibitor,Inhibition,,Not Determined,,8.8,pIC50,1.5x10<sup>-9</sup>,IC50,,Inhibition of kinase activity,,False,False,,False,False,"[{'referenceId': 26779, 'pmid': 10753475, 'typ..."
3,85831,1797,0,Human,False,,10181,,False,Inhibitor,Inhibition,,Not Determined,,6.3,pIC50,5.2x10<sup>-7</sup>,IC50,,,,False,False,,False,False,"[{'referenceId': 36665, 'pmid': 30503936, 'typ..."
4,78743,1797,0,Human,True,,4941,,False,Inhibitor,Inhibition,,Not Determined,,8.3,pKi,5.5x10<sup>-9</sup>,Ki,,,,False,False,,False,False,"[{'referenceId': 22940, 'pmid': 17416531, 'typ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90,82681,1797,0,Human,True,,6883,,False,Antibody,Antagonist,,Selective,,10.3,pKd,5x10<sup>-11</sup>,Kd,,,,False,False,,False,False,"[{'referenceId': 29633, 'pmid': 14967460, 'typ..."
91,83041,1797,0,Human,False,,8894,,False,Inhibitor,Inhibition,,Non-selective,,8.6,pIC50,2.4x10<sup>-9</sup>,IC50,,,,False,False,,False,False,"[{'referenceId': 30079, 'pmid': 20143778, 'typ..."
92,83527,1797,0,Human,True,,9125,,False,Antibody,Binding,,Selective,,,,,,,,,False,False,,False,False,"[{'referenceId': 30893, 'pmid': 25911688, 'typ..."
93,84556,1797,0,Human,False,,9662,,False,Inhibitor,Irreversible inhibition,,Not Determined,,7.9,pIC50,1.3x10<sup>-8</sup>,IC50,,,,False,False,,False,False,"[{'referenceId': 33558, 'pmid': 28115222, 'typ..."


### Ligand annotations

In [14]:
gtopdb_ligand_id = 10194
url = f"https://www.guidetopharmacology.org/services/ligands/{gtopdb_ligand_id}"
response = requests.get(url)
data = response.json()
data

{'ligandId': 10194,
 'name': 'compound 6g [PMID: 30633509]',
 'abbreviation': '',
 'inn': '',
 'type': 'Synthetic organic',
 'species': None,
 'radioactive': False,
 'labelled': False,
 'approved': False,
 'withdrawn': False,
 'whoEssential': False,
 'immuno': False,
 'malaria': False,
 'approvalSource': '',
 'subunitIds': [],
 'complexIds': [],
 'prodrugIds': [],
 'activeDrugIds': []}

In [15]:
gtopdb_ligand_id = 10194
url = f"https://www.guidetopharmacology.org/services/ligands/{gtopdb_ligand_id}/structure"
response = requests.get(url)
data = response.json()
data

{'iupacName': '6-(p-Tolyl)-N-(3,4,5-trimethoxyphenyl)thieno[3,2-d]pyrimidin-4-amine',
 'smiles': 'COc1cc(cc(c1OC)OC)Nc1ncnc2c1sc(c2)c1ccc(cc1)C',
 'inchi': 'InChI=1S/C22H21N3O3S/c1-13-5-7-14(8-6-13)19-11-16-21(29-19)22(24-12-23-16)25-15-9-17(26-2)20(28-4)18(10-15)27-3/h5-12H,1-4H3,(H,23,24,25)',
 'inchiKey': 'IXKCNJIWRNIRNO-UHFFFAOYSA-N',
 'oneLetterSeq': None,
 'threeLetterSeq': None,
 'postTranslationalModifications': None,
 'chemicalModifications': None}

In [16]:
gtopdb_ligand_id = 10194
url = (
    f"https://www.guidetopharmacology.org/services/ligands/{gtopdb_ligand_id}/molecularProperties"
)
response = requests.get(url)
data = response.json()
data

{'hydrogenBondAcceptors': 3,
 'hydrogenBondDonors': 1,
 'rotatableBonds': 6,
 'topologicalPolarSurfaceArea': 93.74,
 'molecularWeight': 407.1303627,
 'logP': 4.012,
 'lipinskisRuleOfFive': 0}

## Discussion

Wrap up the talktorial's content here and discuss pros/cons and open questions/challenges.

## Quiz

Ask three questions that the user should be able to answer after doing this talktorial. Choose important take-aways from this talktorial for your questions.

1. Question
2. Question
3. Question

<div class="alert alert-block alert-info">

<b>Useful checks at the end</b>: 
    
<ul>
<li>Clear output and rerun your complete notebook. Does it finish without errors?</li>
<li>Check if your talktorial's runtime is as excepted. If not, try to find out which step(s) take unexpectedly long.</li>
<li>Flag code cells with <code># NBVAL_CHECK_OUTPUT</code> that have deterministic output and should be tested within our Continuous Integration (CI) framework.</li>
</ul>

</div>