# YADA Deconvolution

---



Run the following cells for deconvolution using YADA.

## 1 - Import Prerequisites.

In [2]:
%load_ext autoreload
%autoreload 2

from IPython.display import FileLink, FileLinks
import pandas as pd
from YADA import *

pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 10000)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## 2 - Configure Input Files.

Example input files are in the ../data/ folder. We demonstrate with input files from RNAseq sequencing.


In [5]:
#Reference matrix name. Should be normalized as the mix data.
pure = '../data/Challenge/pure-107019_RNASeq.csv'

#This is the mixture file in the format: columns: mix1, mix2, ..., rows: gene names.
mix = '../data/Challenge/mix-107019_RNASeq.csv'

#True cell type proportions file.
labels = 'Challenge'

## 3 - Preprocess Data.
YADA preprocessing involves the following steps:
- Filling missing values with 0.
- If the maximum value of all genes is less than 20, we raise all values to the power of two.
- We consider only genes that are common to both marker gene list and mix datasets.
- Standardization is performed by column (i.e., by cell type) by subtracting the minimum value and dividing by the mean.

In [6]:
pure, mix = preprocess(pure, mix)

## 4 - If Needed, Run Gene Differentiation Algorithm. 

In [7]:
# Gene differentiation algorithm.
gene_list_df = gene_diff(pure, mix)

Yada doesn't need the full pure gene expression matrix. It only need the marker gene list for each cell type. Given a complete reference table it can deduce this table by using the gene_diff function but in most cases only marker gene lists are availabe and they can be provided by creating a gene_list_df dataframe.

In [8]:
gene_list_df

Unnamed: 0,naive.B.cells,memory.B.cells,naive.CD4.T.cells,naive.CD8.T.cells,memory.CD8.T.cells,regulatory.T.cells,monocytes,NK.cells,myeloid.dendritic.cells,neutrophils
0,linc01013,rp4-809f18.1,fst,hsfy2,rapgef4-as1,mtnd1p23,rp11-73m18.2,trdj1,cd1e,fcgr3b
1,kcnh8,mir568,phf2p2,rp11-20p5.2,rp11-347c18.1,krt1,rna5sp154,spon2,bx255923.2,tnfrsf10c
2,bmp3,ac007003.1,ctd-2358c21.4,frg2b,rp13-1032i1.10,rp11-47i22.4,rp11-747h12.5,klrf1,fcer1a,kcnj15
3,mybpc2,borcs7-asmt,pin4p1,nr1i2,glra2,golga6l6,ch17-125a10.1,sh2d1b,rp11-290h9.2,mme
4,snord84,rn7sl152p,cnn2p8,rp11-677m14.6,ac015849.2,kynup3,adamts5,s1pr5,znf366,cmtm2
...,...,...,...,...,...,...,...,...,...,...
75,st6galnac4p1,rp11-512m8.11,linc00933,ctd-2036p10.6,rp1-95l4.3,znf75bp,rps3ap43,nuak1,slc2a12,cdh2
76,prelid3bp6,prdx2p1,tmem256-plscr3,rp11-89n17.2,pgam4,ccr8,rp11-280o1.2,ttc38,ppargc1a,gp1bb
77,ctb-179i1.1,rps10p14,ranp8,fcf1p1,rp11-112l6.3,tnfrsf4,smarce1p6,copz2,wnt5a,cxcr2
78,rpl3p1,kynup2,or7e36p,hmgn2p17,znf536,rnu6-1091p,cd300e,rnf165,spns3,kcnh7


## 5 - Run Deconvolution.

In [9]:
result = run_dtw_deconv_ensemble(pure, mix, gene_list_df)
result

#Download Result.
#FileLink('data/results.csv')
#from google.colab import files
#files.download('data/results.csv') 

  0%|          | 0/400 [00:00<?, ?it/s]

Unnamed: 0,naive.B.cells,memory.B.cells,naive.CD4.T.cells,naive.CD8.T.cells,memory.CD8.T.cells,regulatory.T.cells,monocytes,NK.cells,myeloid.dendritic.cells,neutrophils
mix0,0.06503912,0.142507,0.02399,0.087828,0.059746,0.023759,0.179455,0.01337636,0.122108,0.231983
mix1,0.1314857,0.056707,0.027507,0.012981,0.022228,0.13059,0.260642,0.1025668,0.237409,0.004163
mix2,0.03611347,0.153005,0.065271,0.070005,0.205123,0.12097,0.159211,0.05648946,0.016257,0.135436
mix3,0.2182881,0.000704,0.084932,0.015611,0.107218,0.065285,0.183174,0.1201982,0.034488,0.149653
mix4,0.06355393,0.002325,0.000645,0.018862,0.198445,0.039979,0.332993,0.3260868,0.008024,0.000154
mix5,0.2018532,0.077789,0.07535,0.102962,0.004957,0.18946,0.130377,0.03537647,0.116168,0.041872
mix6,0.03152259,0.030757,0.030833,0.008515,0.101863,0.044703,0.390222,0.3120015,0.021239,0.030546
mix7,0.2228929,0.13206,0.300013,0.007182,0.006187,0.179296,0.052823,0.01276459,0.03585,0.074555
mix8,3.898724e-18,0.223633,0.076265,0.350764,0.0014,0.148981,0.02238,0.05816777,0.077069,0.035163
mix9,0.0861234,0.008664,0.177272,0.085976,0.092007,0.0399,0.198931,0.007773977,0.256746,0.053201


## 5 - Evaluate Results.

In case true proportions are available.

In [10]:
res = calc_corr(labels, result) # columns=['dataset', 'celltype', 'pearson', 'spearman', 'p'])
res

Unnamed: 0,Challenge,celltype,Pearson,Spearman,p
0,Challenge,naive.B.cells,0.990379,0.969925,1.714356e-12
1,Challenge,memory.B.cells,0.995523,0.986466,1.377568e-15
2,Challenge,naive.CD4.T.cells,0.991046,0.980451,3.689359e-14
3,Challenge,naive.CD8.T.cells,0.996308,0.977444,1.322934e-13
4,Challenge,memory.CD8.T.cells,0.995849,0.990977,3.642322e-17
5,Challenge,regulatory.T.cells,0.991629,0.98797,4.798484e-16
6,Challenge,monocytes,0.985931,0.986466,1.377568e-15
7,Challenge,NK.cells,0.998726,0.995489,7.230721e-20
8,Challenge,myeloid.dendritic.cells,0.990135,0.98797,4.798484e-16
9,Challenge,neutrophils,0.995891,0.981955,1.804935e-14
