# KLIFS kinase names

Explore different kinase name columns.

In [None]:
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd

from opencadd.databases.klifs import setup_remote, setup_local

INFO:opencadd.databases.klifs.api:If you want to see an non-truncated version of the DataFrames in this module, use `pd.set_option('display.max_columns', 50)` in your notebook.


In [3]:
pd.set_option('display.max_columns', 50)

In [4]:
remote = setup_remote()

INFO:opencadd.databases.klifs.api:Set up remote session...
INFO:opencadd.databases.klifs.api:Remote session is ready!


## Get kinase details 1 (kinase names)

**KLIFS Swagger: Information/get_kinase_names**

In [5]:
kinases1 = remote.kinases.all_kinases()
kinases1.sort_values("kinase.klifs_id", inplace=True)
kinases1.reset_index(drop=True, inplace=True)
kinases1.head()

Unnamed: 0,kinase.klifs_id,kinase.klifs_name,kinase.full_name,kinase.gene_name,kinase.uniprot,species.klifs
0,1,AKT1,v-akt murine thymoma viral oncogene homolog 1,AKT1,,Human
1,2,AKT2,v-akt murine thymoma viral oncogene homolog 2,AKT2,,Human
2,3,AKT3,v-akt murine thymoma viral oncogene homolog 3,AKT3,,Human
3,4,CRIK,citron rho-interacting serine/threonine kinase,CIT,,Human
4,5,DMPK1,dystrophia myotonica protein kinase,DMPK,,Human


## Get kinase details 2 (kinase information)

**KLIFS Swagger: Information/get_kinase_information**

In [6]:
kinase_ids = kinases1["kinase.klifs_id"].to_list()
print(f"Number of IDs: {len(kinase_ids)}")

Number of IDs: 1127


In [7]:
kinases2 = remote.kinases.by_kinase_klifs_id(kinase_ids)
print(f"Number of kinases: {kinases2.shape[0]}")
kinases2.sort_values("kinase.klifs_id", inplace=True)
kinases2.reset_index(drop=True, inplace=True)
kinases2.head()

Number of kinases: 1127


Unnamed: 0,kinase.klifs_id,kinase.klifs_name,kinase.full_name,kinase.gene_name,kinase.family,kinase.group,kinase.class,species.klifs,kinase.uniprot,kinase.iuphar,kinase.pocket
0,1,AKT1,v-akt murine thymoma viral oncogene homolog 1,AKT1,Akt,AGC,,Human,P31749,1479,KLLGKGTFGKVILYAMKILHTLTENRVLQNSRPFLTALKYSCFVME...
1,2,AKT2,v-akt murine thymoma viral oncogene homolog 2,AKT2,Akt,AGC,,Human,P31751,1480,KLLGKGTFGKVILYAMKILHTVTESRVLQNTRPFLTALKYACFVME...
2,3,AKT3,v-akt murine thymoma viral oncogene homolog 3,AKT3,Akt,AGC,,Human,Q9Y243,2286,KLLGKGTFGKVILYAMKILHTLTESRVLKNTRPFLTSLKYSCFVME...
3,4,CRIK,citron rho-interacting serine/threonine kinase,CIT,DMPK,AGC,CRIK,Human,O14578,1509,SLVGCGHFAEVQVYAMKVMFFEEERNILSRSTPWIPQLQYAYLVME...
4,5,DMPK1,dystrophia myotonica protein kinase,DMPK,DMPK,AGC,GEK,Human,Q09013,1505,KVIGRGAFSEVAVYAMKIMCFREERDVLVNGDRWITQLHFAYLVME...


## **Questions** regarding kinases 1 or kinases 2

### How many kinases have ambiguous kinase names (`klifs_name` != `gene_name`)?

In [8]:
kinases1[kinases1.apply(lambda x: x["kinase.klifs_name"] != x["kinase.gene_name"], axis=1)].shape

(817, 6)

In [9]:
kinases2[kinases2.apply(lambda x: x["kinase.klifs_name"] != x["kinase.gene_name"], axis=1)].shape

(817, 11)

### Which columns are matched for kinase name?

In [10]:
remote.kinases.by_kinase_name(kinase_names='CRIK')

HBox(children=(HTML(value='Processing...'), FloatProgress(value=0.0, max=1.0), HTML(value='')))




Unnamed: 0,kinase.klifs_id,kinase.klifs_name,kinase.full_name,kinase.gene_name,kinase.family,kinase.group,kinase.class,species.klifs,kinase.uniprot,kinase.iuphar,kinase.pocket
0,4,CRIK,citron rho-interacting serine/threonine kinase,CIT,DMPK,AGC,CRIK,Human,O14578,1509,SLVGCGHFAEVQVYAMKVMFFEEERNILSRSTPWIPQLQYAYLVME...
1,637,CRIK,citron,Cit,DMPK,AGC,,Mouse,P49025,0,SLVGCGHFAEVQVYAMKIMFFEEERNILSRSTPWIPQLQYAYLVME...


In [11]:
remote.kinases.by_kinase_name(kinase_names='CIT')

HBox(children=(HTML(value='Processing...'), FloatProgress(value=0.0, max=1.0), HTML(value='')))




Unnamed: 0,kinase.klifs_id,kinase.klifs_name,kinase.full_name,kinase.gene_name,kinase.family,kinase.group,kinase.class,species.klifs,kinase.uniprot,kinase.iuphar,kinase.pocket
0,4,CRIK,citron rho-interacting serine/threonine kinase,CIT,DMPK,AGC,CRIK,Human,O14578,1509,SLVGCGHFAEVQVYAMKVMFFEEERNILSRSTPWIPQLQYAYLVME...
1,637,CRIK,citron,Cit,DMPK,AGC,,Mouse,P49025,0,SLVGCGHFAEVQVYAMKIMFFEEERNILSRSTPWIPQLQYAYLVME...


__Note__: Apparently, the kinase name is matched for `kinase.klifs_name` and `kinase.gene_name`.

## Merge details for kinases 1 and 2

In [12]:
kinases = kinases1.merge(kinases2, on="kinase.klifs_id", how="left")
kinases = kinases[["kinase.klifs_id", "kinase.gene_name_x", "kinase.full_name_x", "kinase.klifs_name_x", "kinase.klifs_name_y", "kinase.gene_name_y", "kinase.full_name_y"]]
kinases

Unnamed: 0,kinase.klifs_id,kinase.gene_name_x,kinase.full_name_x,kinase.klifs_name_x,kinase.klifs_name_y,kinase.gene_name_y,kinase.full_name_y
0,1,AKT1,v-akt murine thymoma viral oncogene homolog 1,AKT1,AKT1,AKT1,v-akt murine thymoma viral oncogene homolog 1
1,2,AKT2,v-akt murine thymoma viral oncogene homolog 2,AKT2,AKT2,AKT2,v-akt murine thymoma viral oncogene homolog 2
2,3,AKT3,v-akt murine thymoma viral oncogene homolog 3,AKT3,AKT3,AKT3,v-akt murine thymoma viral oncogene homolog 3
3,4,CIT,citron rho-interacting serine/threonine kinase,CRIK,CRIK,CIT,citron rho-interacting serine/threonine kinase
4,5,DMPK,dystrophia myotonica protein kinase,DMPK1,DMPK1,DMPK,dystrophia myotonica protein kinase
...,...,...,...,...,...,...,...
1122,1123,Pip5k1a,"phosphatidylinositol-4-phosphate 5-kinase, typ...",Pip5k1a,Pip5k1a,Pip5k1a,"phosphatidylinositol-4-phosphate 5-kinase, typ..."
1123,1124,Map4k2,mitogen-activated protein kinase kinase kinase...,Map4k2,Map4k2,Map4k2,mitogen-activated protein kinase kinase kinase...
1124,1125,Pan3,PAN3 poly(A) specific ribonuclease subunit,Pan3,Pan3,Pan3,PAN3 poly(A) specific ribonuclease subunit
1125,1126,Plk5,polo like kinase 5,Plk5,Plk5,Plk5,polo like kinase 5


## **Questions** comparing kinases 1 and kinases 2 details

### Differing `kinase.klifs_names` (kinases 1 vs. kinases 2)?

In [13]:
kinases[kinases["kinase.klifs_name_x"] != kinases["kinase.klifs_name_x"]]

Unnamed: 0,kinase.klifs_id,kinase.gene_name_x,kinase.full_name_x,kinase.klifs_name_x,kinase.klifs_name_y,kinase.gene_name_y,kinase.full_name_y


In [14]:
kinases[kinases["kinase.klifs_name_x"].isin(["", " ", 0, "0", None])]

Unnamed: 0,kinase.klifs_id,kinase.gene_name_x,kinase.full_name_x,kinase.klifs_name_x,kinase.klifs_name_y,kinase.gene_name_y,kinase.full_name_y


### Differing `kinase.name_full` (kinases 1 vs. kinases 2)?

In [15]:
kinases[kinases["kinase.full_name_x"] != kinases["kinase.full_name_x"]]

Unnamed: 0,kinase.klifs_id,kinase.gene_name_x,kinase.full_name_x,kinase.klifs_name_x,kinase.klifs_name_y,kinase.gene_name_y,kinase.full_name_y


In [16]:
kinases[kinases["kinase.full_name_x"].isin(["", " ", 0, "0", None])]

Unnamed: 0,kinase.klifs_id,kinase.gene_name_x,kinase.full_name_x,kinase.klifs_name_x,kinase.klifs_name_y,kinase.gene_name_y,kinase.full_name_y
528,529,,0,A6,A6,,0
529,530,,0,A6r,A6r,,0


### Differing `kinase.gene_name` (kinases 1 vs. kinases 2)?

In [17]:
kinases[kinases["kinase.gene_name_x"] != kinases["kinase.gene_name_x"]]

Unnamed: 0,kinase.klifs_id,kinase.gene_name_x,kinase.full_name_x,kinase.klifs_name_x,kinase.klifs_name_y,kinase.gene_name_y,kinase.full_name_y


In [18]:
kinases[kinases["kinase.gene_name_x"].isin(["", " ", 0, "0", None])]

Unnamed: 0,kinase.klifs_id,kinase.gene_name_x,kinase.full_name_x,kinase.klifs_name_x,kinase.klifs_name_y,kinase.gene_name_y,kinase.full_name_y
528,529,,0,A6,A6,,0
529,530,,0,A6r,A6r,,0


**Note 1**: Columns are identical in kinases 1 and kinases 2, yay!!

**Note 2**: Kinases A6 and A6r are the only kinases with partially missing kinase names.

## Local kinase details

In [19]:
from opencadd.databases.klifs.local import _LocalDatabaseGenerator
local = _LocalDatabaseGenerator()
klifs_export_path = "data/KLIFS_export.20201020.csv.zip"
klifs_export = local._from_klifs_export_file(klifs_export_path)
klifs_overview_path = "data/overview.20201020.csv.zip"
klifs_overview = local._from_klifs_overview_file(klifs_overview_path)
print(klifs_export.shape, klifs_overview.shape)

(11592, 15) (11592, 26)


In [20]:
klifs_export.sort_values(["structure.pdb_id", "structure.chain", "structure.alternate_model"], inplace=True, ignore_index=True)
klifs_export.head()

Unnamed: 0,kinase.names,kinase.gene_name,kinase.klifs_name,kinase.family,kinase.group,structure.pdb_id,structure.chain,structure.alternate_model,species.klifs,ligand.name,ligand.expo_id,ligand_allosteric.name,ligand_allosteric.expo_id,structure.dfg,structure.ac_helix
0,"[MAPK14, p38a]",MAPK14,p38a,MAPK,CMGC,1a9u,A,-,Human,4-[5-(4-FLUORO-PHENYL)-2-(4-METHANESULFINYL-PH...,SB2,-,-,in,out-like
1,[HCK],HCK,HCK,Src,TK,1ad5,A,-,Human,PHOSPHOAMINOPHOSPHONIC ACID-ADENYLATE ESTER,ANP,-,-,in,out
2,[HCK],HCK,HCK,Src,TK,1ad5,B,-,Human,PHOSPHOAMINOPHOSPHONIC ACID-ADENYLATE ESTER,ANP,-,-,in,out-like
3,[FGFR1],FGFR1,FGFR1,FGFR,TK,1agw,A,A,Human,3-[4-(1-FORMYLPIPERAZIN-4-YL)-BENZYLIDENYL]-2-...,SU2,-,-,in,out-like
4,[FGFR1],FGFR1,FGFR1,FGFR,TK,1agw,A,B,Human,3-[4-(1-FORMYLPIPERAZIN-4-YL)-BENZYLIDENYL]-2-...,SU2,-,-,in,out-like


In [21]:
klifs_export[klifs_export["kinase.names"].apply(len) == 2].shape

(4867, 15)

In [22]:
klifs_overview.sort_values(["structure.pdb_id", "structure.chain", "structure.alternate_model"], inplace=True, ignore_index=True)
klifs_overview.head()

Unnamed: 0,species.klifs,structure.pdb_id,structure.alternate_model,structure.chain,ligand.expo_id,ligand_allosteric.expo_id,structure.rmsd1,structure.rmsd2,structure.qualityscore,structure.pocket,structure.resolution,structure.missing_residues,structure.missing_atoms,interaction.fingerprint,structure.fp_i,structure.fp_ii,structure.bp_i_a,structure.bp_i_b,structure.bp_ii_in,structure.bp_ii_a_in,structure.bp_ii_b_in,structure.bp_ii_out,structure.bp_ii_b,structure.bp_iii,structure.bp_iv,structure.bp_v
0,Human,1a9u,-,A,SB2,-,0.828,2.186,8.0,SPVGSGAYGSVCAVAVKKLRTYRELRLLKHMKENVIGLLDVYLVTH...,2.5,0,0,0000000000000000000000000000000000000000000000...,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Human,1ad5,-,A,ANP,-,0.816,2.141,9.6,KKLGAGQFGEVWMVAVKTMAFLAEANVMKTLQDKLVKLHAVYIITE...,2.6,0,4,0000000000000010000000000000000000000000000000...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Human,1ad5,-,B,ANP,-,0.817,2.141,9.6,KKLGAGQFGEVWMVAVKTMAFLAEANVMKTLQDKLVKLHAVYIITE...,2.6,0,4,0000000000000010000001000000000000000000000000...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Human,1agw,A,A,SU2,-,0.831,2.001,7.6,KPLG_____QVVLVAVKMLDLISEMEMMKMIGKNIINLLGAYVIVE...,2.4,5,4,0000000000000010000000000000000000000000000000...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Human,1agw,B,A,SU2,-,0.831,2.001,7.6,KPLG_____QVVLVAVKMLDLISEMEMMKMIGKNIINLLGAYVIVE...,2.4,5,4,0000000000000010000000000000000000000000000000...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Kinase name mismatches locally and remotely?

In [23]:
klifs_export[~klifs_export["kinase.klifs_name"].isin(kinases["kinase.klifs_name_x"].to_list())]

Unnamed: 0,kinase.names,kinase.gene_name,kinase.klifs_name,kinase.family,kinase.group,structure.pdb_id,structure.chain,structure.alternate_model,species.klifs,ligand.name,ligand.expo_id,ligand_allosteric.name,ligand_allosteric.expo_id,structure.dfg,structure.ac_helix


### Kinase HGNC name mismatches locally and remotely?

In [24]:
klifs_export[~klifs_export["kinase.gene_name"].isin(kinases["kinase.gene_name_x"].to_list())]

Unnamed: 0,kinase.names,kinase.gene_name,kinase.klifs_name,kinase.family,kinase.group,structure.pdb_id,structure.chain,structure.alternate_model,species.klifs,ligand.name,ligand.expo_id,ligand_allosteric.name,ligand_allosteric.expo_id,structure.dfg,structure.ac_helix
