# Yada Deconvolution

---



Run the following cells for deconvolution using Yada.

## 1 - Import Prerequisites.

In [None]:
#On Colab.
!pip install -q tslearn gseapy similaritymeasures
!git clone https://github.com/zurkin1/Yada.git
!mv Yada/* .

In [7]:
%load_ext autoreload
%autoreload 2

from IPython.display import FileLink, FileLinks
import pandas as pd
from yada import *

pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 10000)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## 2 - Configure Input Files.

Example input files are in the ./data/ folder. We demonstrate with input files from RNAseq sequensing.


In [8]:
#Reference matrix name. Should be normalized as the mix data.
pure = './data/Challenge/pure-107019_RNASeq.csv'

#This is the mixture file in the format: columns: mix1, mix2, ..., rows: gene names.
mix = './data/Challenge/mix-107019_RNASeq.csv'

#True cell type proportions file.
labels = 'Challenge'

## 3 - Preprocess Data.
Yada preprocessing does the following:
- Fill missing values with 0.
- If maximume value of all genes is less than 20 we raise all values by power of two.
- We consider only genes that are joined by both pure and mix.
- We standardize by column, i.e. cell type (subtract the minimum and devide by the mean).

In [9]:
pure, mix = preprocess(pure, mix)

## 4 - If Needed, Run Gene Differentiation Algorithm. 

In [10]:
# Gene differentiation algorithm.
gene_list_df = gene_diff(pure, mix)

Yada doesn't need the full pure gene expression matrix. It only need the marker gene list for each cell type as the following table demonstrate. Given a complete reference table it can deduce this table by using the gene_diff function but in most cases only marker gene lists are availabe and they can be provided by creating a gene_list_df dataframe.

In [18]:
gene_list_df

Unnamed: 0,naive.B.cells,memory.B.cells,naive.CD4.T.cells,naive.CD8.T.cells,memory.CD8.T.cells,regulatory.T.cells,monocytes,NK.cells,myeloid.dendritic.cells,neutrophils
0,linc01013,rp4-809f18.1,fst,hsfy2,rapgef4-as1,mtnd1p23,rp11-73m18.2,trdj1,cd1e,fcgr3b
1,kcnh8,mir568,phf2p2,rp11-20p5.2,rp11-347c18.1,krt1,rna5sp154,spon2,bx255923.2,tnfrsf10c
2,bmp3,ac007003.1,ctd-2358c21.4,frg2b,rp13-1032i1.10,rp11-47i22.4,rp11-747h12.5,klrf1,fcer1a,kcnj15
3,mybpc2,borcs7-asmt,pin4p1,nr1i2,glra2,golga6l6,ch17-125a10.1,sh2d1b,rp11-290h9.2,mme
4,snord84,rn7sl152p,cnn2p8,rp11-677m14.6,ac015849.2,kynup3,adamts5,s1pr5,znf366,cmtm2
...,...,...,...,...,...,...,...,...,...,...
75,st6galnac4p1,rp11-512m8.11,linc00933,ctd-2036p10.6,rp1-95l4.3,znf75bp,rps3ap43,nuak1,slc2a12,cdh2
76,prelid3bp6,prdx2p1,tmem256-plscr3,rp11-89n17.2,pgam4,ccr8,rp11-280o1.2,ttc38,ppargc1a,gp1bb
77,ctb-179i1.1,rps10p14,ranp8,fcf1p1,rp11-112l6.3,tnfrsf4,smarce1p6,copz2,wnt5a,cxcr2
78,rpl3p1,kynup2,or7e36p,hmgn2p17,znf536,rnu6-1091p,cd300e,rnf165,spns3,kcnh7


## 5 - Run Deconvolution.

In [11]:
result = run_dtw_deconv_ensemble(pure, mix, gene_list_df)
result

#Download Result.
#FileLink('data/results.csv')
#from google.colab import files
#files.download('data/results.csv') 

 100%

Unnamed: 0,naive.B.cells,memory.B.cells,naive.CD4.T.cells,naive.CD8.T.cells,memory.CD8.T.cells,regulatory.T.cells,monocytes,NK.cells,myeloid.dendritic.cells,neutrophils
mix0,0.060501,0.151439,0.025116,0.089964,0.056428,0.020611,0.176676,0.01386709,0.111372,0.237821
mix1,0.124796,0.064892,0.029363,0.01304,0.024912,0.123619,0.266058,0.1067014,0.232955,0.00529
mix2,0.036633,0.159805,0.069276,0.072243,0.201479,0.112636,0.165271,0.06124802,0.018187,0.146782
mix3,0.202183,0.002709,0.08993,0.018216,0.107872,0.059484,0.184237,0.1218215,0.03312,0.160889
mix4,0.058828,0.001955,0.000897,0.026604,0.209261,0.036342,0.316475,0.3141248,0.007267,0.000983
mix5,0.190025,0.080601,0.08701,0.092504,0.006435,0.17368,0.13725,0.03673775,0.114871,0.042159
mix6,0.029115,0.031874,0.031375,0.014749,0.108412,0.041168,0.37988,0.3025387,0.019716,0.034385
mix7,0.215427,0.143223,0.312161,0.008168,0.008352,0.159585,0.057039,0.01280165,0.037336,0.074376
mix8,0.00051,0.22685,0.086694,0.339974,0.001486,0.137029,0.022194,0.0610131,0.076182,0.037009
mix9,0.085372,0.008859,0.190433,0.086595,0.085866,0.037795,0.209612,0.007880094,0.246846,0.054207


## 5 - Evaluate Results.

In case true proportions are available.

In [12]:
res = pd.DataFrame(calc_corr(labels, result), columns=['dataset', 'celltype', 'pearson', 'spearman', 'p'])
res

Unnamed: 0,dataset,celltype,pearson,spearman,p
0,Challenge,naive.B.cells,0.986265,0.969925,1.714356e-12
1,Challenge,memory.B.cells,0.991853,0.972932,6.714944e-13
2,Challenge,naive.CD4.T.cells,0.986423,0.954887,6.237876e-11
3,Challenge,naive.CD8.T.cells,0.990962,0.960902,1.759049e-11
4,Challenge,memory.CD8.T.cells,0.989054,0.980451,3.689359e-14
5,Challenge,regulatory.T.cells,0.98571,0.97594,2.35192e-13
6,Challenge,monocytes,0.978056,0.986466,1.377568e-15
7,Challenge,NK.cells,0.996966,0.993985,9.577980999999999e-19
8,Challenge,myeloid.dendritic.cells,0.986988,0.986466,1.377568e-15
9,Challenge,neutrophils,0.992528,0.969925,1.714356e-12
