# FIlter deconvolution

I obtained larga table of infiltration estimation from the [Timer website](http://timer.cistrome.org/) with different algoritm data. Now we need to get CIBERSORT from the large table with all deconvolutions and filter it.

Here the differences in deconvolution by algorithms -- http://timex.moffitt.org/tcga

In [1]:
import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from scipy.stats import spearmanr

In [2]:
deconv = pd.read_csv("infiltration_estimation_for_tcga.csv.gz")
deconv.head(3)

Unnamed: 0,cell_type,B cell_TIMER,T cell CD4+_TIMER,T cell CD8+_TIMER,Neutrophil_TIMER,Macrophage_TIMER,Myeloid dendritic cell_TIMER,B cell naive_CIBERSORT,B cell memory_CIBERSORT,B cell plasma_CIBERSORT,...,stroma score_XCELL,microenvironment score_XCELL,B cell_EPIC,Cancer associated fibroblast_EPIC,T cell CD4+_EPIC,T cell CD8+_EPIC,Endothelial cell_EPIC,Macrophage_EPIC,NK cell_EPIC,uncharacterized cell_EPIC
0,TCGA-OR-A5J1-01,0.108257,0.117024,0.201176,0.112595,0.05366,0.493232,0.002937,0.002283,0.0,...,0.007461,0.042267,0.001269,0.002373,0.040845,0.022052,0.045725,0.015717,7.288868e-09,0.872019
1,TCGA-OR-A5J2-01,0.114475,0.106788,0.213193,0.112099,0.065909,0.490548,0.04638,0.0,0.151495,...,0.080493,0.106181,0.002224,0.012606,0.048541,0.007944,0.070874,0.00402,4.301966e-10,0.853792
2,TCGA-OR-A5J3-01,0.102441,0.105615,0.202602,0.108904,0.047359,0.470324,0.061035,0.0,0.190538,...,0.03207,0.039554,0.001711,0.003025,0.040014,0.01109,0.029778,0.012461,7.304094e-09,0.90192


In [3]:
CIBERSORT = deconv.filter(regex="CIBERSORT|cell_type")
CIBERSORT.head(3)

Unnamed: 0,cell_type,B cell naive_CIBERSORT,B cell memory_CIBERSORT,B cell plasma_CIBERSORT,T cell CD8+_CIBERSORT,T cell CD4+ naive_CIBERSORT,T cell CD4+ memory resting_CIBERSORT,T cell CD4+ memory activated_CIBERSORT,T cell follicular helper_CIBERSORT,T cell regulatory (Tregs)_CIBERSORT,...,Monocyte_CIBERSORT-ABS,Macrophage M0_CIBERSORT-ABS,Macrophage M1_CIBERSORT-ABS,Macrophage M2_CIBERSORT-ABS,Myeloid dendritic cell resting_CIBERSORT-ABS,Myeloid dendritic cell activated_CIBERSORT-ABS,Mast cell activated_CIBERSORT-ABS,Mast cell resting_CIBERSORT-ABS,Eosinophil_CIBERSORT-ABS,Neutrophil_CIBERSORT-ABS
0,TCGA-OR-A5J1-01,0.002937,0.002283,0.0,0.112941,0.0,0.20875,0.0,0.020154,0.049803,...,0.018473,0.0,0.0,0.07128,0.0,0.002659,0.011477,0.0,0.00032,0.0
1,TCGA-OR-A5J2-01,0.04638,0.0,0.151495,0.073519,0.0,0.083533,0.0,0.057336,0.0,...,0.006484,0.012895,0.005895,0.120971,0.0,0.009007,0.0,0.002818,0.0,0.0
2,TCGA-OR-A5J3-01,0.061035,0.0,0.190538,0.016495,0.060717,0.138748,0.0,0.0,0.0,...,0.004296,0.0,0.0,0.02618,0.0,0.018127,0.005667,0.0,0.0,0.0


In [4]:
suffixes = deconv.cell_type.apply(lambda x: x[-3:])
suffixes = set(suffixes)
suffixes

{'-01', '-02', '-03', '-05', '-06', '-07', '-11'}

There are many types of TCGA suffixes, listed [there.](https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/sample-type-codes)

We need only "Primary Solid Tumor" aka `-01`.

In [5]:
CIBERSORT = CIBERSORT[CIBERSORT.cell_type.str.contains("-01")]
CIBERSORT.head(3)

Unnamed: 0,cell_type,B cell naive_CIBERSORT,B cell memory_CIBERSORT,B cell plasma_CIBERSORT,T cell CD8+_CIBERSORT,T cell CD4+ naive_CIBERSORT,T cell CD4+ memory resting_CIBERSORT,T cell CD4+ memory activated_CIBERSORT,T cell follicular helper_CIBERSORT,T cell regulatory (Tregs)_CIBERSORT,...,Monocyte_CIBERSORT-ABS,Macrophage M0_CIBERSORT-ABS,Macrophage M1_CIBERSORT-ABS,Macrophage M2_CIBERSORT-ABS,Myeloid dendritic cell resting_CIBERSORT-ABS,Myeloid dendritic cell activated_CIBERSORT-ABS,Mast cell activated_CIBERSORT-ABS,Mast cell resting_CIBERSORT-ABS,Eosinophil_CIBERSORT-ABS,Neutrophil_CIBERSORT-ABS
0,TCGA-OR-A5J1-01,0.002937,0.002283,0.0,0.112941,0.0,0.20875,0.0,0.020154,0.049803,...,0.018473,0.0,0.0,0.07128,0.0,0.002659,0.011477,0.0,0.00032,0.0
1,TCGA-OR-A5J2-01,0.04638,0.0,0.151495,0.073519,0.0,0.083533,0.0,0.057336,0.0,...,0.006484,0.012895,0.005895,0.120971,0.0,0.009007,0.0,0.002818,0.0,0.0
2,TCGA-OR-A5J3-01,0.061035,0.0,0.190538,0.016495,0.060717,0.138748,0.0,0.0,0.0,...,0.004296,0.0,0.0,0.02618,0.0,0.018127,0.005667,0.0,0.0,0.0


In [6]:
CIBERSORT.columns

Index(['cell_type', 'B cell naive_CIBERSORT', 'B cell memory_CIBERSORT',
       'B cell plasma_CIBERSORT', 'T cell CD8+_CIBERSORT',
       'T cell CD4+ naive_CIBERSORT', 'T cell CD4+ memory resting_CIBERSORT',
       'T cell CD4+ memory activated_CIBERSORT',
       'T cell follicular helper_CIBERSORT',
       'T cell regulatory (Tregs)_CIBERSORT', 'T cell gamma delta_CIBERSORT',
       'NK cell resting_CIBERSORT', 'NK cell activated_CIBERSORT',
       'Monocyte_CIBERSORT', 'Macrophage M0_CIBERSORT',
       'Macrophage M1_CIBERSORT', 'Macrophage M2_CIBERSORT',
       'Myeloid dendritic cell resting_CIBERSORT',
       'Myeloid dendritic cell activated_CIBERSORT',
       'Mast cell activated_CIBERSORT', 'Mast cell resting_CIBERSORT',
       'Eosinophil_CIBERSORT', 'Neutrophil_CIBERSORT',
       'B cell naive_CIBERSORT-ABS', 'B cell memory_CIBERSORT-ABS',
       'B cell plasma_CIBERSORT-ABS', 'T cell CD8+_CIBERSORT-ABS',
       'T cell CD4+ naive_CIBERSORT-ABS',
       'T cell CD4+ memory 

Done!

### Get only ABS with absolute counts, not parts

In [7]:
CIBERSORT_abs = CIBERSORT.set_index("cell_type").filter(regex="^(.*-ABS)").copy()
CIBERSORT_abs.columns = CIBERSORT_abs.columns.str.replace(
    "_CIBERSORT-ABS", "", regex=True
)
CIBERSORT_abs.index = CIBERSORT_abs.index.map(lambda x: "-".join(x.split("-")[:-1]))
CIBERSORT_abs.head(3)

Unnamed: 0_level_0,B cell naive,B cell memory,B cell plasma,T cell CD8+,T cell CD4+ naive,T cell CD4+ memory resting,T cell CD4+ memory activated,T cell follicular helper,T cell regulatory (Tregs),T cell gamma delta,...,Monocyte,Macrophage M0,Macrophage M1,Macrophage M2,Myeloid dendritic cell resting,Myeloid dendritic cell activated,Mast cell activated,Mast cell resting,Eosinophil,Neutrophil
cell_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
TCGA-OR-A5J1,0.000552,0.000429,0.0,0.021223,0.0,0.039226,0.0,0.003787,0.009358,0.0,...,0.018473,0.0,0.0,0.07128,0.0,0.002659,0.011477,0.0,0.00032,0.0
TCGA-OR-A5J2,0.013554,0.0,0.044274,0.021486,0.0,0.024412,0.0,0.016756,0.0,0.0,...,0.006484,0.012895,0.005895,0.120971,0.0,0.009007,0.0,0.002818,0.0,0.0
TCGA-OR-A5J3,0.007283,0.0,0.022736,0.001968,0.007245,0.016556,0.0,0.0,0.0,0.0,...,0.004296,0.0,0.0,0.02618,0.0,0.018127,0.005667,0.0,0.0,0.0


Write final table

In [8]:
with open("CIBERSORT-ABS_dec.csv", "w") as f:
    f.write(CIBERSORT_abs.to_csv())

All done