# Yada Deconvolution

---


In this notebook we demonstrate running Yada with a pure matrix having only lists of marker genes i.e. no RNA counts of relevant cell types.

## 1 - Import Prerequisites.

In [None]:
#On Colab.
!pip install -q tslearn gseapy similaritymeasures
!git clone https://github.com/zurkin1/Yada.git
!mv Yada/* .

In [1]:
%load_ext autoreload
%autoreload 2

from IPython.display import FileLink, FileLinks
import pandas as pd
from yada import *

pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 10000)

## 2 - Configure Input Files.

Example input files are in the ./data/ folder. We demonstrate with input files from the xCell deconvolution method.


In [2]:
#Reference matrix name. Should be normalized as the mix data.
pure = './data/xCell/pure.csv'

#This is the mixture file in the format: columns: mix1, mix2, ..., rows: gene names.
mix = './data/xCell/mix.csv'

#True cell type proportions file.
labels = 'xCell'

## 3 - Preprocess Data.
Yada preprocessing does the following:
- Fill missing values with 0.
- If maximume value of all genes is less than 20 we raise all values by power of two.
- We consider only genes that are joined by both pure and mix.
- We standardize by column, i.e. cell type (subtract the minimum and devide by mean).

In [3]:
pure, mix = preprocess_only_marker('./data/xCell/pure.csv', './data/xCell/mix.csv')

## 4 - No Need to Run Gene Differentiation Algorithm Since We Have The Pure Matrix.
Previously we ran the gene differentiation algorithm in order to get a marker gene list for each cell type. Now we have exactly this list in the pure matrix. We just need to assign it to the gene_list_df variable.

In [4]:
pure.head()

Unnamed: 0,Adipocytes,Astrocytes,B-cells,Basophils,CD4+ T-cells,CD4+ Tcm,CD4+ Tem,CD4+ memory T-cells,CD4+ naive T-cells,CD8+ T-cells,CD8+ Tcm,CD8+ Tem,CD8+ naive T-cells,CLP,CMP,Chondrocytes,DC,Endothelial cells,Eosinophils,Epithelial cells,Erythrocytes,Fibroblasts,GMP,HSC,Hepatocytes,Keratinocytes,MEP,MPP,MSC,Macrophages,Macrophages M1,Macrophages M2,Mast cells,Megakaryocytes,Melanocytes,Memory B-cells,Mesangial cells,Monocytes,Myocytes,NK cells,NKT,Neurons,Neutrophils,Osteoblast,Pericytes,Plasma cells,Platelets,Preadipocytes,Sebocytes,Skeletal muscle,Smooth muscle,Tgd cells,Th1 cells,Th2 cells,Tregs,ly Endothelial cells,mv Endothelial cells,naive B-cells,pro B-cells
0,adh1b,acta2,tnfrsf17,ceacam8,bad,cd5,tnfsf8,cd3g,cd2,cd8a,cd8b,abcd2,cd8a,cox6c,azu1,adra1d,flt3,acvrl1,agtr2,nqo1,cenpa,arf4,ceacam8,cd34,aadac,adam8,bub1b,abo,htr7,acadvl,acp2,acp2,,arhgap6,abl2,art1,cdh6,asgr2,evc,gzmh,phkg1,abca3,clc,arcn1,atp5j,alpi,adcy8,adh5,,,copa,abcd2,ifng,gpr15,ccr4,flt4,acvrl1,rere,azu1
1,dlat,cnn1,cd19,clk1,apbb1,cd40lg,cd2,aamp,cd3g,apbb1,abcd2,slc25a20,,calm1,adss,,ache,angpt2,ccr3,flnb,alas2,adh5,clc,alas2,acadl,,fxn,adcy3,dctd,acp2,abcd1,adra2b,anxa1,anxa3,,blk,acta2,ap1g1,cdh15,il2rb,ambn,aldoc,bmx,bmpr1a,adcy3,tnfrsf17,alox12,arcn1,entpd3,ache,ccng1,,chd4,il5,ctla4,gja4,actg1,cd1a,adarb2
2,,cbr3,actn2,actn2,krit1,bmpr1a,rpn2,adsl,cd4,cd3d,bmpr1a,abcf1,gpr15,dntt,tor1a,arl1,cd1b,tie1,c3ar1,adm,aplnr,bmpr1a,atp5j,crhbp,abat,dsg3,atic,avp,atp6v1c1,atox1,adra2b,clcn7,anxa11,,acacb,tnfrsf17,cd70,aif1,,faslg,casp5,epha3,ca4,rhoa,col10a1,,,adcyap1r1,csf2,cav3,adh1b,bub1,cox10,gzmk,cd5,angpt2,adra1b,cxcr5,
3,,,blk,scgb2a2,abcd2,ccr4,araf,cd6,cd3e,casp8,adcyap1r1,faslg,ccr8,igll1,alox15,ccnb1,alcam,bmx,adora3,bik,epb42,adh1b,aplnr,,,bdkrb2,ahcy,amd1,col10a1,arsb,,fgr,atp6v1c1,,dct,acrv1,cdkn1c,csnk1a1,alpl,gzmb,rara,,apaf1,,flt1,bmp8b,apoa1,arhgap6,,,add1,,cstf1,gzma,ccr3,acvrl1,cetp,blk,blk
4,slc25a6,col11a1,btk,fcn1,cd5,adsl,aire,,ccr7,,cd8a,dhx8,krt1,h3f3b,ms4a3,comp,c1qa,,alox15,f3,gata1,add1,arhgap6,crygd,acads,,ca1,azu1,cyc1,,alcam,dnase1l3,,,mlana,,,abcb7,copb1,bad,,,ceacam3,,slc31a1,avp,,fgf7,gjb5,actn2,cdk4,cd2,,,cd28,,angpt2,bmp3,arg1


Notice the lower case and empty spots in the table. The lower case is recommended but not mandatory and can be handled by Yada. The empty spaces are also not an issue. Try to make sure that gene names are unique in every column.

In [5]:
gene_list_df = pure

Yada doesn't need the full pure gene expression matrix. It only need the marker gene list for each cell type as the following table demonstrate. Given a complete reference table it can deduce this table by using the gene_diff function but in most cases only marker gene lists are availabe and they can be provided by creating a gene_list_df dataframe.

## 5 - Run Deconvolution.

In [6]:
result = run_dtw_deconv_ensemble(pure, mix, gene_list_df)
result

#Download Result.
#FileLink('data/results.csv')
#from google.colab import files
#files.download('data/results.csv') 

 100%

Unnamed: 0,Adipocytes,Astrocytes,B-cells,Basophils,CD4+ T-cells,CD4+ Tcm,CD4+ Tem,CD4+ memory T-cells,CD4+ naive T-cells,CD8+ T-cells,CD8+ Tcm,CD8+ Tem,CD8+ naive T-cells,CLP,CMP,Chondrocytes,DC,Endothelial cells,Eosinophils,Epithelial cells,Erythrocytes,Fibroblasts,GMP,HSC,Hepatocytes,Keratinocytes,MEP,MPP,MSC,Macrophages,Macrophages M1,Macrophages M2,Mast cells,Megakaryocytes,Melanocytes,Memory B-cells,Mesangial cells,Monocytes,Myocytes,NK cells,NKT,Neurons,Neutrophils,Osteoblast,Pericytes,Plasma cells,Platelets,Preadipocytes,Sebocytes,Skeletal muscle,Smooth muscle,Tgd cells,Th1 cells,Th2 cells,Tregs,ly Endothelial cells,mv Endothelial cells,naive B-cells,pro B-cells
SUB134264,0.007918,0.006877,0.040754,0.020015,0.019394,0.017914,0.017658,0.017857,0.008732,0.020282,0.023482,0.013992,0.043545,0.014501,0.027284,0.036842,0.009844,0.015252,0.002516,0.010760,0.013033,0.026820,0.008928,0.005702,0.007724,0.009526,0.009334,0.032517,0.008416,0.019074,0.032892,0.012409,0.005615,0.023570,0.019812,0.034887,0.018355,0.030016,0.009701,0.010878,0.007287,0.003761,0.039308,0.010005,0.021363,0.021722,0.004084,0.032317,0.039645,0.011451,0.017964,0.000293,0.005570,0.008226,0.001847,0.007384,0.019625,0.024608,0.019555
SUB134282,0.009368,0.013783,0.039041,0.014590,0.014737,0.009712,0.010935,0.011717,0.007054,0.011793,0.013220,0.016774,0.028944,0.016553,0.017594,0.035601,0.011797,0.009285,0.009812,0.014802,0.017059,0.038260,0.007169,0.007431,0.005545,0.017411,0.012332,0.033496,0.007507,0.026599,0.030586,0.016332,0.007367,0.019503,0.029415,0.045349,0.034337,0.031945,0.012373,0.012781,0.006744,0.006752,0.043876,0.008516,0.021959,0.014278,0.002290,0.046215,0.029937,0.012316,0.007521,0.000388,0.005532,0.012516,0.001768,0.009642,0.014150,0.029752,0.050042
SUB134283,0.004416,0.009838,0.044314,0.019696,0.007788,0.007751,0.015869,0.006253,0.001404,0.003994,0.006220,0.008177,0.021373,0.014659,0.021976,0.031082,0.012529,0.013859,0.009090,0.010431,0.020190,0.038179,0.008267,0.009262,0.006100,0.012120,0.020521,0.026844,0.005555,0.025371,0.022338,0.012788,0.006183,0.027138,0.013754,0.066347,0.028150,0.031763,0.008798,0.004324,0.007795,0.002330,0.048152,0.006425,0.015872,0.015050,0.007115,0.028441,0.031777,0.007994,0.006851,0.000095,0.007404,0.004204,0.001023,0.007479,0.011253,0.025073,0.027621
SUB134259,0.010005,0.016220,0.031336,0.015152,0.039827,0.016881,0.022542,0.025274,0.010017,0.042079,0.031430,0.031116,0.064895,0.018101,0.016620,0.029868,0.007186,0.029289,0.000864,0.004874,0.009680,0.020732,0.008250,0.004397,0.004277,0.005208,0.004926,0.020016,0.011152,0.013690,0.030214,0.020833,0.003184,0.007489,0.017069,0.015914,0.024610,0.017948,0.010642,0.023963,0.004796,0.003588,0.008530,0.007822,0.018420,0.017458,0.000120,0.019033,0.022698,0.008839,0.020540,0.000730,0.008218,0.031492,0.003119,0.003385,0.019622,0.014853,0.007328
SUB134285,0.013803,0.014288,0.019108,0.018246,0.019295,0.014707,0.013958,0.022381,0.009029,0.015923,0.021986,0.016868,0.054030,0.015175,0.025055,0.024423,0.010611,0.019154,0.005935,0.006756,0.020801,0.035399,0.010002,0.010141,0.005541,0.014245,0.018222,0.030939,0.007830,0.018240,0.018874,0.016444,0.006303,0.009304,0.029263,0.026515,0.034173,0.032713,0.008185,0.008750,0.009312,0.004420,0.037838,0.009018,0.021892,0.020554,0.003760,0.023389,0.045019,0.010333,0.016363,0.000469,0.008474,0.011114,0.001997,0.005696,0.017167,0.015526,0.023194
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SUB134309.1,0.013557,0.005929,0.021068,0.010492,0.026045,0.021990,0.019091,0.021679,0.015446,0.022948,0.037542,0.016689,0.078790,0.022094,0.018873,0.017873,0.007182,0.029425,0.004504,0.008101,0.011570,0.023731,0.009619,0.003376,0.001584,0.009461,0.011115,0.025243,0.009104,0.012724,0.024341,0.016149,0.003813,0.002560,0.012246,0.030982,0.015706,0.016243,0.008261,0.010672,0.002700,0.003711,0.018699,0.004814,0.024542,0.030732,0.001703,0.020687,0.047742,0.008153,0.021806,0.000552,0.008475,0.016885,0.001602,0.005170,0.016462,0.030954,0.006662
SUB134308.1,0.005735,0.010832,0.045843,0.011298,0.017160,0.016160,0.009467,0.021321,0.013388,0.009962,0.023920,0.014240,0.082449,0.016924,0.028008,0.013036,0.010580,0.035774,0.003337,0.007532,0.017162,0.017132,0.005883,0.007291,0.005072,0.013951,0.013071,0.027090,0.012960,0.013551,0.028960,0.008042,0.005881,0.011810,0.020444,0.037292,0.023032,0.020980,0.007545,0.005783,0.007341,0.007456,0.026567,0.006762,0.022761,0.019516,0.003024,0.018492,0.056700,0.007558,0.018417,0.000501,0.007559,0.013019,0.002950,0.008062,0.020578,0.043285,0.016529
SUB134296.1,0.006786,0.011375,0.030840,0.016252,0.010080,0.010100,0.008426,0.011843,0.007471,0.006297,0.014818,0.012602,0.050952,0.014257,0.036218,0.018709,0.011151,0.018764,0.004275,0.010531,0.022369,0.029797,0.007618,0.007859,0.003009,0.013684,0.028536,0.029116,0.007310,0.018240,0.031603,0.009608,0.007486,0.010625,0.020565,0.029329,0.034620,0.029877,0.008438,0.004590,0.009160,0.005084,0.047117,0.006445,0.023283,0.015580,0.001537,0.021947,0.062578,0.011119,0.010047,0.000199,0.005178,0.005956,0.001002,0.008371,0.015093,0.014476,0.016246
SUB134295.1,0.017412,0.005799,0.034735,0.011065,0.015465,0.014370,0.014743,0.019250,0.013527,0.009060,0.021271,0.015554,0.085003,0.027294,0.017301,0.022555,0.012030,0.026869,0.004272,0.007305,0.011966,0.019176,0.006654,0.005105,0.002349,0.012362,0.013253,0.024916,0.011562,0.013992,0.033562,0.008672,0.008959,0.012323,0.014254,0.024877,0.028895,0.028615,0.006085,0.007490,0.006693,0.005218,0.034959,0.005021,0.022649,0.028001,0.004718,0.020330,0.017227,0.011125,0.025769,0.000542,0.009914,0.011469,0.001596,0.009508,0.018096,0.017420,0.020981


## 5 - Evaluate Results.

In case true proportions are available.

In [7]:
res = pd.DataFrame(calc_corr(labels, result), columns=['dataset', 'celltype', 'pearson', 'spearman', 'p'])
res

Unnamed: 0,dataset,celltype,pearson,spearman,p
0,xCell,B-cells,0.313979,0.349755,0.004956676
1,xCell,CD4+ T-cells,0.299104,0.287446,0.02235487
2,xCell,CD8+ T-cells,0.746598,0.708717,8.099728e-11
3,xCell,CD4+ Tem,-0.046975,-0.037014,0.7733433
4,xCell,CD8+ Tem,0.342915,0.374701,0.002481369
5,xCell,Tgd cells,0.019891,0.246717,0.05126046
6,xCell,Memory B-cells,0.202424,0.305368,0.0149461
7,xCell,Monocytes,0.266271,0.218801,0.084917
8,xCell,naive B-cells,0.461955,0.616694,7.381078e-08
9,xCell,CD4+ naive T-cells,0.341965,0.324865,0.009383056
