# T023 · What is a kinase?

Authors:

- Talia B. Kimber, 2021, [Volkamer lab, Charité](https://volkamerlab.org/)
- Dominique Sydow, 2021, [Volkamer lab, Charité](https://volkamerlab.org/)
- Andrea Volkamer, 2021, [Volkamer lab, Charité](https://volkamerlab.org/)

## Aim of this talktorial

In this talktorial, we will talk about kinases: why are they important in life and drug design, what do they look like, and what data resources are available?

### Contents in *Theory*

- Kinases in a nutshell
    - The human kinome
    - Kinase structures and important motifs
- Kinase resources
    - Kinase structures and related information
    - Bioactivity data
- Kinase-similarity: off-target, promiscuous binding
- Kinase dataset compilation

### Contents in *Practical*

- Retrieve and preprocess data
- Show kinase coverage
- Compare kinases
- Visualize similarity as kinase matrix
- Visualize similarity as phylogenetic tree

### References

- Kinase dataset: [<i>Molecules</i> (2021), <b>26(3)</b>, 629](https://www.mdpi.com/1420-3049/26/3/629) 
- Kinase similarity descriptor: XXX
- Sequence-based kinase clustering: Manning et al. [<i>Science</i> (2002), <b>298(5600)</b>, 1912-1934](https://doi.org/10.1126/science.1075762)
- KLIFS
  - KLIFS URL: https://klifs.net/
  - KLIFS database: [<i>Nucleic Acid Res.</i> (2020), <b>49(D1)</b>, D562-D569](https://doi.org/10.1093/nar/gkaa895)
  - KLIFS binding site definition: [<i>J. Med. Chem.</i> (2014), <b>57(2)</b>, 249-277](https://doi.org/10.1021/jm400378w)
- Bioactivity data
  - Karaman et al. dataset: [<i>Nature Biotechnology</i> (2008), <b>26</b>, 127-132](https://doi.org/10.1038/nbt1358)
  - Davis et al. dataset: [<i>Nature Biotechnology</i> (2011), <b>29</b>, 1046-1051](https://doi.org/10.1038/nbt.1990)
  - KIBA dataset: [<i>J. Chem. Inf. Model.</i> (2014), <b>54(3)</b>, 753-743](https://doi.org/10.1021/ci400709d)
  - PKIS dataset: [<i>PLOS ONE</i> (2017), <b>12</b>, 1-20](https://doi.org/10.1371/journal.pone.0181585)

## Theory

### Kinases in a nutshell

Kinases are established drug targets to combat cancer and inflammatory diseases ([TODO ref](https://onlinelibrary.wiley.com/doi/book/10.1002/9783527633470))
* Activate proteins by phosphorylation
* Most frequently mutated proteins in tumors
* 5782 x-ray structures of human kinases (as of Sept. 2021, see [KLIFS](https://klifs.net/) database
* 67 FDA approved small molecule protein kinase inhibitors on the market (as of Sept. 2021, see [link](http://www.brimr.org/PKI/PKIs.htm))
* Most of the approved drugs bind in the ATP-binding pocketand intermediate surroundings 

While there is so much research also going on in the field of kinases, there are still open challenges
* A large fraction of the kinome is un-/underexplored
* Many kinase inhibitors are promiscous binders -> off-target effects or polypharmacology
* Occurence of drug resistances due to mutations

[TODO add a 3D figure of a kinase structure with ATP bound, and maybe DFG-in motif highlighted]

Nevertheless the focus on this protein family has led to a plethora of freely available data on compounds, bioactivity, and structures that are being used for computational drug development. 
[TODO cite Kooistra, Volkamer, ARMC V.50, Elsevier, 2017, 153-192]

#### The human kinome 

* Human kinome consists of ~540 protein kinases
    * [TODO: maybe comment on numbers differ depedning on the source, ref to kinodata]
* Grouped by Manning [<i>Science</i> (2002), <b>298(5600)</b>, 1912-1934](https://doi.org/10.1126/science.1075762) into eight major groups and one 'other' group
    * AGC, CAMK, CK1, CMGC, RGC, STE, TK, TKL, Other
    * Phylogenetic tree, clustered by overall sequence similarity
    
[TODO add a kinome tree figure from kinhub]

[numbers taken from Molecules publication, maybe add some references]

#### Kinase structures and important motifs

Important regions
* hinge region: key hydrogen bonds 
* DFG motif: Flip between phenylalanine (F) and aspartate (D) driver for active and inactive state
* αC-helix: αC-in conformation -> salt bridge
* glycine-rich (G-rich) loop: stabilizes ATP binding

[Maybe a 3D figure, or include motifs already in the above mentioned figure]


### Kinase resources

#### Kinase structures and related information: KLIFS

The KLIFS database ([<i>Nucleic Acid Res.</i> (2020), <b>49(D1)</b>, D562-D569](https://doi.org/10.1093/nar/gkaa895), [<i>J. Med. Chem.</i> (2014), <b>57(2)</b>, 249-277](https://doi.org/10.1021/jm400378w)) fetches all kinase structures deposited in the structural database PDB ([<i>Acta Cryst.</i> (2002), <b>D58</b>, 899-907](https://doi.org/10.1107/S0907444902003451), [<i>Structure</i> (2012), <b>20(3)</b>, 391-396](https://doi.org/10.1016/j.str.2012.01.010)) and processes them as follows: All multi-chain structures in the PDB are split into monomers and aligned to each other with a special focus on a pre-defined binding site of 85 residues (Figure 1). For example, this means that the conserved gatekeeper (GK) residue at KLIFS position 45 can be easily and quickly looked up in any of the over 10,000 monomeric kinase structures in KLIFS. 

![KLIFS binding site](https://klifs.net/images/faq/xcolors.png.pagespeed.ic.dprMuoZGzn.webp)

*Figure 1:* 
Kinase binding site residues as defined by KLIFS.
Figure and description taken from: [<i>J. Med. Chem.</i> (2014), <b>57(2)</b>, 249-277](https://doi.org/10.1021/jm400378w).

Each structure, kinase, and ligand in KLIFS is associated with an identifier:

- Structure KLIFS ID
- Kinase KLIFS ID
- Ligand KLIFS ID

#### Bioactivity data

TODO - short !
* maybe an overview of data points per kinase on Chembl (from kinodata)
* other profiling data (as available in karaman, xxx)

- Karaman et al. dataset: TODO
  - Paper: [<i>Nature Biotechnology</i> (2008), <b>26</b>, 127-132](https://doi.org/10.1038/nbt1358)
  - Data: [KinMap data (JSON)](http://kinhub.org/js/Davis_profiling.js)
- Davis et al. dataset: TODO
  - Paper: [<i>Nature Biotechnology</i> (2011), <b>29</b>, 1046-1051](https://doi.org/10.1038/nbt.1990)
  - Data: [KinMap data (JSON)](http://kinhub.org/js/Karaman_profiling.js)
- KIBA dataset: TODO
  - Paper: [<i>J. Chem. Inf. Model.</i> (2014), <b>54(3)</b>, 753-743](https://doi.org/10.1021/ci400709d)
  - Data: [SI data (XLSX)](https://ndownloader.figstatic.com/files/3950161)
- PKIS dataset: 
  - Paper: [<i>PLOS ONE</i> (2017), <b>12</b>, 1-20](https://doi.org/10.1371/journal.pone.0181585)
  - Data: [SI data (XLSX)](https://doi.org/10.1371/journal.pone.0181585.s004)

### Kinase-similarity: off-target, promiscuous binding

TODO Problem statement. Introduce
* problem of promiscuous binding (e.g. by profiling results for known inhibitors, (kinhub figure), example from molecules paper?)
* different perspectives: sequence, structure and ligand-profiling data
* that's why we investigate in similarity from the different perspectives


### Kinase dataset compilation

In the coure of the kinase similarity talktorials (**Talktorials T024-T028**), we will use nine kinases from [<i>Molecules</i> (2021), <b>26(3)</b>, 629](https://www.mdpi.com/1420-3049/26/3/629), which were selected for the following reasons:

> - Profile 1 combined __EGFR__ and __ErbB2__ as targets and __BRAF__ as a (general) anti-target. 
> - Out of similar considerations, Profile 2 consisted of EGFR and __PI3K__ as targets and BRAF as anti-target. This profile is expected to be more challenging as PI3K is an atypical kinase and thus less similar to EGFR than for example ErbB2 used in Profile 1. 
> - Profile 3, comprised of EGFR and __VEGFR2__ as targets and BRAF as anti-target, was contrasted with the hit rate that we found with a standard docking against the single target VEGFR2 (Profile 4).
> - To broaden the comparison and obtain an estimate for the promiscuity of each compound, the kinases __CDK2__, __LCK__, __MET__ and __p38α__ were included in the experimental assay panel and the structure-based bioinformatics comparison as commonly used anti-targets.

We have collected information about these nine kinases in the CSV file `kinase_selection.csv`:

- `kinase`: Kinase name as used in [<i>Molecules</i> (2021), <b>26(3)</b>, 629](https://www.mdpi.com/1420-3049/26/3/629)
- `kinase_klifs`: Kinase name as used in the KLIFS database
- `uniprot_id`: Kinase UniProt ID
- `group`: Kinase group as defined by Manning et al. [<i>Science</i> (2002), <b>298(5600)</b>, 1912-1934](https://doi.org/10.1126/science.1075762)
- `full_kinase_name`: Full kinase name as used in [<i>Molecules</i> (2021), <b>26(3)</b>, 629](https://www.mdpi.com/1420-3049/26/3/629)

Note: You can run the kinase similarity **Talktorials T024-T028** with your own set of kinases. Please update the CSV file with your kinases; the only mandatory columns are `kinase_klifs` and `uniprot_id`.

In [1]:
from pathlib import Path

import pandas as pd

In [2]:
HERE = Path(_dh[-1])
DATA = HERE / "data"

In [3]:
kinase_selection_df = pd.read_csv(DATA/"kinase_selection.csv")
kinase_selection_df

Unnamed: 0,kinase,kinase_klifs,uniprot_id,group,full_kinase_name
0,EGFR,EGFR,P00533,TK,Epidermal growth factor receptor
1,ErbB2,ErbB2,P04626,TK,Erythroblastic leukemia viral oncogene homolog 2
2,PI3K,p110a,P42336,Atypical,Phosphatidylinositol-3-kinase
3,VEGFR2,KDR,P35968,TK,Vascular endothelial growth factor receptor 2
4,BRAF,BRAF,P15056,TKL,Rapidly accelerated fibrosarcoma isoform B
5,CDK2,CDK2,P24941,CMGC,Cyclic-dependent kinase 2
6,LCK,LCK,P06239,TK,Lymphocyte-specific protein tyrosine kinase
7,MET,MET,P08581,TK,Mesenchymal-epithelial transition factor
8,p38a,p38a,Q16539,CMGC,p38 mitogen activated protein kinase alpha
