# T023 · What is a kinase?

Authors:

- Talia B. Kimber, 2021, [Volkamer lab, Charité](https://volkamerlab.org/)
- Dominique Sydow, 2021, [Volkamer lab, Charité](https://volkamerlab.org/)
- Andrea Volkamer, 2021, [Volkamer lab, Charité](https://volkamerlab.org/)

## Aim of this talktorial

Add a short summary of this talktorial's content.

### Contents in *Theory*

* Kinases in a nutshell
    * The human kinome
    * Kinase structures and important motifs
    * Collection of kinase related information: KLIFS
    * Available bioactivity data
* Kinase-similarity: off-target, promiscuous binding
* Kinase dataset compilation

### Contents in *Practical*

* Retrieve and preprocess data
* Show kinase coverage
* Compare kinases
* Visualize similarity as kinase matrix
* Visualize similarity as phylogenetic tree

### References

* Kinase dataset: [<i>Molecules</i> (2021), <b>26(3)</b>, 629](https://www.mdpi.com/1420-3049/26/3/629) 
* Kinase similarity descriptor: XXX

## Theory

### Kinases in a nutshell

Kinases are established drug targets to combat cancer and inflammatory diseases ([TODO ref](https://onlinelibrary.wiley.com/doi/book/10.1002/9783527633470))
* Activate proteins by phosphorylation
* Most frequently mutated proteins in tumors
* 5782 x-ray structures of human kinases (as of Sept. 2021, see [KLIFS](https://klifs.net/) database
* 67 FDA approved small molecule protein kinase inhibitors on the market (as of Sept. 2021, see [link](http://www.brimr.org/PKI/PKIs.htm))
* Most of the approved drugs bind in the ATP-binding pocketand intermediate surroundings 

While there is so much research also going on in the field of kinases, there are still open challenges
* A large fraction of the kinome is un-/underexplored
* Many kinase inhibitors are promiscous binders -> off-target effects or polypharmacology
* Occurence of drug resistances due to mutations

[TODO add a 3D figure of a kinase structure with ATP bound, and maybe DFG-in motif highlighted]

Nevertheless the focus on this protein family has led to a plethora of freely available data on compounds, bioactivity, and structures that are being used for computational drug development. 
[TODO cite Kooistra, Volkamer, ARMC V.50, Elsevier, 2017, 153-192]

#### The human kinome 

* Human kinome consists of ~540 protein kinases
    * [TODO: maybe comment on numbers differ depedning on the source, ref to kinodata]
* Grouped by Manning [TODO add ref] into eight major groups and one 'other' group
    * AGC, CAMK, CK1, CMGC, RGC, STE, TK, TKL, Other
    * Phylogenetic tree, clustered by overall sequence similarity
    
[TODO add a kinome tree figure from kinhub]

[numbers taken from Molecules publication, maybe add some references]

#### Kinase structures and important motifs

Important regions
* hinge region: key hydrogen bonds 
* DFG motif: Flip between phenylalanine (F) and aspartate (D) driver for active and inactive state
* αC-helix: αC-in conformation -> salt bridge
* glycine-rich (G-rich) loop: stabilizes ATP binding

[Maybe a 3D figure, or include motifs already in the above mentioned figure]


#### Collection of kinase related information: KLIFS

TODO - short!

#### Available bioactivity data

TODO - short !
* maybe an overview of data points per kinase on Chembl (from kinodata)
* other profiling data (as available in karaman, xxx)

### Kinase-similarity: off-target, promiscuous binding

TODO Problem statement. Introduce
* problem of promiscuous binding (e.g. by profiling results for known inhibitors, (kinhub figure), example from molecules paper?)
* different perspectives: sequence, structure and ligand-profiling data
* that's why we investigate in similarity from the different perspectives


### Kinase dataset compilation

We will use nine kinases from [<i>Molecules</i> (2021), <b>26(3)</b>, 629](https://www.mdpi.com/1420-3049/26/3/629) because:

> We aggregated the investigated kinases in “profiles” (Table 2). Profile 1 combined EGFR and ErbB2 as targets (indicated by a ‘+’) and BRAF (from rapidly accelerated fibrosarcoma isoform B) as a (general) anti-target (designated by a ‘—’). Out of similar considerations, Profile 2 consisted of EGFR and PI3K as targets and BRAF as anti-target. This profile is expected to be more challenging as PI3K is an atypical kinase and thus less similar to EGFR than for example ErbB2 used in Profile 1. Profile 3, comprised of EGFR and VEGFR2 as targets and BRAF as anti-target, was contrasted with the hit rate that we found with a standard docking against the single target VEGFR2 (Profile 4).
> To broaden the comparison and obtain an estimate for the promiscuity of each compound, the kinases CDK2 (cyclic-dependent kinase 2), LCK (lymphocyte-specific protein tyrosine kinase), MET (mesenchymal-epithelial transition factor) and p38α (p38 mitogen activated protein kinase α) were included in the experimental assay panel and the structure-based bioinformatics comparison as commonly used anti-targets.

![image.png](attachment:814048cb-e723-4b10-b9f7-2b56848688d9.png)

*Figure 1:* 
Kinases used in this notebook, taken from [<i>Molecules</i> (2021), <b>26(3)</b>, 629](https://www.mdpi.com/1420-3049/26/3/629) (Table 1).

### TODO

Data set used throughout the following notebooks, maybe directly shaped in a way that it has the correct information per task in place


## Practical

In [1]:
from pathlib import Path

import pandas as pd

In [2]:
HERE = Path(_dh[-1])
DATA = HERE / "data"

### Retrieve and preprocess data

In [15]:
kinase_selection = pd.read_csv(DATA/"kinase_selection.csv")
kinase_selection

Unnamed: 0,kinase,kinase_klifs,uniprot_id,group,full_kinase_name
0,EGFR,EGFR,P00533,TK,Epidermal growth factor receptor
1,ErbB2,ErbB2,P04626,TK,Erythroblastic leukemia viral oncogene homolog 2
2,PI3K,p110a,P42336,Atypical,Phosphatidylinositol-3-kinase
3,VEGFR2,KDR,P35968,TK,Vascular endothelial growth factor receptor 2
4,BRAF,BRAF,P15056,TKL,Rapidly accelerated fibrosarcoma isoform B
5,CDK2,CDK2,P24941,CMGC,Cyclic-dependent kinase 2
6,LCK,LCK,P06239,TK,Lymphocyte-specific protein tyrosine kinase
7,MET,MET,P08581,TK,Mesenchymal-epithelial transition factor
8,p38a,p38a,Q16539,CMGC,p38 mitogen activated protein kinase alpha


### Show kinase coverage

### Compare kinases

### Visualize similarity as kinase matrix

### Visualize similarity as phylogenetic tree

## Discussion

Wrap up the talktorial's content here and discuss pros/cons and open questions/challenges.

## Quiz

Ask three questions that the user should be able to answer after doing this talktorial. Choose important take-aways from this talktorial for your questions.

1. Question
2. Question
3. Question