# Closer look at EGFR-GAK KiSSim ranks

We see in `006_evaluation/004_profiling_karaman_davis` that GAK is not detected as similar kinase to EGFR although Erlotinib targets both EGFR and GAK (Karaman profiling data).

Let's load again our KiSSim kinase matrix and check the ranks of GAK and EGFR when the kinases are sorted by KiSSim-similarity to EGFR and GAK, respectively.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from src import data



## KiSSim kinase matrix

In [3]:
# Full dataset (reported only, not further used)
kissim_df = data.distances.kissim(kinmap_kinases=False)
print(f"Kinase distance matrix shape: {kissim_df.shape}")

Kinase distance matrix shape: (257, 257)


In `006_evaluation/004_profiling_karaman_davis`, we only consider a subset of kinases, for which we have profiling data available. Let's get the list of those kinases so that we can consider in the following (a) all kinases and (b) only the subset of kinases.

In [4]:
profiling_df = data.profiling.karaman_davis()
kinases_selected = profiling_df["Erlotinib"].dropna().index

No measurements                                                                 20611
One measurement                                                                  6296
Two identical measurements                                                       1291
Two measurements <= or > cutoff 100; keep lower value                              30
One measurement <=, one > cutoff 100 but difference <= 100; keep lower value        2
One measurement <=, one > cutoff 100 but difference > 100; remove values           10
dtype: int64


Let's define a small function that lists kinases ranked by their similarity to a query kinase based on KiSSim results.

In [5]:
def kissim_ranks_against(kissim_df, kinase):
    """
    Get KiSSim-ranked kinases for query kinase.
    """
    ranks = kissim_df[kinase].sort_values().reset_index()
    ranks.columns = ["kinase", f"distance to {kinase}"]
    return ranks

## Kinases similar to EGFR

In [6]:
kissim_ranks_egfr = kissim_ranks_against(kissim_df, "EGFR")
print(kissim_ranks_egfr.shape)
kissim_ranks_egfr.head()

(257, 2)


Unnamed: 0,kinase,distance to EGFR
0,EGFR,0.0
1,ErbB4,0.024878
2,ErbB2,0.039648
3,ErbB3,0.046541
4,SYK,0.053504


In [7]:
kissim_ranks_egfr[kissim_ranks_egfr["kinase"] == "GAK"]

Unnamed: 0,kinase,distance to EGFR
182,GAK,0.103189


GAK is placed on the lower half of KiSSim-ranks sorted by similarity to EGFR.

### Subset (kinases with Erlotinib-profiling data)

In [8]:
kissim_ranks_egfr_selected = kissim_ranks_egfr[
    kissim_ranks_egfr["kinase"].isin(kinases_selected)
].reset_index(drop=True)
print(kissim_ranks_egfr_selected.shape)
kissim_ranks_egfr_selected[kissim_ranks_egfr_selected["kinase"] == "GAK"]

(50, 2)


Unnamed: 0,kinase,distance to EGFR
43,GAK,0.103189


## Kinases similar to GAK

In [9]:
kissim_ranks_gak = kissim_ranks_against(kissim_df, "GAK")
print(kissim_ranks_gak.shape)
kissim_ranks_gak.head()

(257, 2)


Unnamed: 0,kinase,distance to GAK
0,GAK,0.0
1,BIKE,0.062587
2,AAK1,0.063517
3,DCLK1,0.081385
4,MSK1-b,0.085052


In [10]:
kissim_ranks_gak[kissim_ranks_gak["kinase"] == "EGFR"]

Unnamed: 0,kinase,distance to GAK
102,EGFR,0.103189


When we turn the query around, EGFR is placed a bit better w.r.t. to GAK, however, still only on rank ~100/250.

### Subset (kinases with Erlotinib-profiling data)

In [11]:
kissim_ranks_gak_selected = kissim_ranks_gak[
    kissim_ranks_gak["kinase"].isin(kinases_selected)
].reset_index(drop=True)
print(kissim_ranks_gak_selected.shape)
kissim_ranks_gak_selected[kissim_ranks_gak_selected["kinase"] == "EGFR"]

(50, 2)


Unnamed: 0,kinase,distance to GAK
25,EGFR,0.103189
