**Notebook - U2OS Kinase Expression**

This notebook looks at the U2OS cell line and associated kinase protein expression data. This information formed part of the basis for including/excluding specific kinase inhibitors.

Data Sources & Information:
- https://depmap.org/portal/download/all/?release=CCLE+2019&file=CCLE_RPPA_20181003.csv
- https://depmap.org/portal/cell_line/ACH-000364?tab=mutation
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861052/
- https://gracebio.com/cancer-cell-line-encyclopedia-expanded-to-include-rppa-analysis-data/


# Imports:

In [1]:
import pandas as pd
import numpy as np

# Load Data:
Data Source: https://depmap.org/portal/download/all/?release=DepMap+Public+22Q4&file=OmicsExpressionProteinCodingGenesTPMLogp1.csv


"Gene expression TPM values of the protein coding genes for DepMap cell lines. Values are inferred from RNA-seq data using the RSEM tool and are reported after log2 transformation, using a pseudo-count of 1; log2(TPM+1)."

In [2]:
# Read in gene expression data:
u2os_omics = pd.read_csv('data/OmicsExpressionProteinCodingGenesTPMLogp1.csv')
u2os_omics = u2os_omics.rename(columns={u2os_omics.columns[0]: 'cell_type'})
print(u2os_omics.shape)
u2os_omics.head(2)

(1408, 19194)


Unnamed: 0,cell_type,TSPAN6 (7105),TNMD (64102),DPM1 (8813),SCYL3 (57147),C1orf112 (55732),FGR (2268),CFH (3075),FUCA2 (2519),GCLC (2729),...,H3C2 (8358),H3C3 (8352),AC098582.1 (8916),DUS4L-BCAP29 (115253422),C8orf44-SGK3 (100533105),ELOA3B (728929),NPBWR1 (2831),ELOA3D (100506888),ELOA3 (162699),CDR1 (1038)
0,ACH-001113,4.331992,0.0,7.36466,2.792855,4.471187,0.028569,1.226509,3.044394,6.500005,...,2.689299,0.189034,0.201634,2.130931,0.555816,0.0,0.275007,0.0,0.0,0.0
1,ACH-001289,4.567424,0.584963,7.106641,2.543496,3.50462,0.0,0.189034,3.813525,4.221877,...,1.286881,1.049631,0.321928,1.464668,0.632268,0.0,0.014355,0.0,0.0,0.0


In [3]:
# U2OS Dep-Map ID - https://depmap.org/portal/cell_line/ACH-000364?tab=mutation:
u2os_depmap_id = 'ACH-000364'

In [4]:
# Subsetting the data to only include the U2OS cell line:
u2os_data = u2os_omics[u2os_omics['cell_type'].str.contains(u2os_depmap_id)]
u2os_data

Unnamed: 0,cell_type,TSPAN6 (7105),TNMD (64102),DPM1 (8813),SCYL3 (57147),C1orf112 (55732),FGR (2268),CFH (3075),FUCA2 (2519),GCLC (2729),...,H3C2 (8358),H3C3 (8352),AC098582.1 (8916),DUS4L-BCAP29 (115253422),C8orf44-SGK3 (100533105),ELOA3B (728929),NPBWR1 (2831),ELOA3D (100506888),ELOA3 (162699),CDR1 (1038)
832,ACH-000364,4.519164,0.0,6.673698,2.134221,4.107688,0.014355,0.422233,5.758889,4.806324,...,2.503349,0.0,0.863938,2.157044,0.0,0.0,0.505891,0.0,0.389567,0.097611


In [5]:
# Example filtering:
u2os_data.filter(regex='EGFR')

Unnamed: 0,EGFR (1956)
832,2.163499


# Relate DepMap Data to Kinase Inhibitor Classes:
Below are are the kinase inhibitor classes included within our dataset and their associated genes:

In [6]:
ki_tgt_dict = {
    'PI3K': 'PIK3CA|PIK3CB|PIK3CD|PIK3CG',
    'EGFR': 'EGFR',
    'p38 MAPK': 'MAPK11|MAPK12|MAPK13|MAPK14',
    'JAK': 'JAK1|JAK2|JAK3|TYK2',
    'RAF': '^ARAF|BRAF|^RAF1',
    'AURK': 'AURKA |AURKB|AURKC',
    'ALK': 'ALK ',
    'SRC': '^SRC |LYN |FYN|YES1|FGR|BLK|HCK|LCK',
    'ROCK': 'ROCK1|ROCK2',
    'MEK': 'MAP2K1|MAP2K2',
    'GSK': 'GSK3A|GSK3B',
    'CDK': 'CDK1 |CDK2 |CDK4 |CDK5 |CDK6 |CDK7 |CDK9 ',
    'VEGFR': 'FLT1|^KDR|FLT4',
    'BCR-ABL': '^ABL1 |^ABL2 |BCR',
    'PDGFR': 'PDGFRA|PDGFRB',
    'FGFR': 'FGFR1 |FGFR2|FGFR3|FGFR4',
    'BTK': '^BTK',    # '^' added to exclude IBTK
    'AKT': 'AKT1 |AKT2|AKT3',
    'mTOR': '^MTOR '
    }

In [7]:
u2os_data.filter(regex=ki_tgt_dict['EGFR'])

Unnamed: 0,EGFR (1956)
832,2.163499


## Return Gene Expression Values:

In [8]:
exp_values = []

for ki in ki_tgt_dict:
    exp_data = u2os_data.filter(regex=ki_tgt_dict[ki]).T.reset_index().values
    exp_values.append(exp_data)

In [9]:
u2os_evs_df = pd.DataFrame(np.concatenate(exp_values), columns=['gene', 'exp_val'])
# Retain just gene names, removing numbers in brackets in original data:
u2os_evs_df['gene'] = [x[0] for x in u2os_evs_df.gene.str.split(" ", expand=False)]
# Reassigning gene to kinase inhibitor in dictionary:
ki_list = [[key for key, value in ki_tgt_dict.items() if x in value] for x in u2os_evs_df['gene']]
u2os_evs_df['k_inhib'] = [item for sublist in ki_list for item in sublist]
u2os_evs_df.head()

Unnamed: 0,gene,exp_val,k_inhib
0,PIK3CB,2.950468,PI3K
1,PIK3CG,0.028569,PI3K
2,PIK3CA,2.9241,PI3K
3,PIK3CD,2.733354,PI3K
4,EGFR,2.163499,EGFR


## Group by Kinase Inhibitor Class:

In [10]:
grouped_ki = pd.DataFrame(u2os_evs_df.groupby('k_inhib')['exp_val'].mean().reset_index()
                         ).sort_values('exp_val', ascending=False)
grouped_ki

Unnamed: 0,k_inhib,exp_val
10,MEK,6.034046
5,CDK,6.030705
8,GSK,5.909385
0,AKT,5.856681
2,AURK,5.759314
17,mTOR,5.100557
13,RAF,5.077632
14,ROCK,4.744178
7,FGFR,4.057675
18,p38 MAPK,3.689417
