# YADA Deconvolution

---

In this notebook, we demonstrate running YADA with a matrix that includes only lists of marker genes, i.e., without RNA counts for relevant cell types.
It is recommended to clone this repository by using:
!git clone https://github.com/zurkin1/Yada.git
and then run it using Jupyter notebook.

## 1 - Import Prerequisites.

In [4]:
%load_ext autoreload
%autoreload 2

from IPython.display import FileLink, FileLinks
import pandas as pd
from YADA import *

pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 10000)

## 2 - Configure Input Files.

Example input files are located in the "./data/" folder. We demonstrate using input files from the xCell deconvolution method.


In [10]:
#Marker gene list.
pure_file_path = '../data/xCell/pure.csv'

#This is the mixture file in the format: columns: mix1, mix2, ..., rows: gene names.
mix_file_path = '..data/xCell/mix.csv'

#True cell type proportions file.
labels_file_path = 'xCell'

## 3 - Data Preprocessing
YADA implements the following data preprocessing steps:

- Missing values are imputed with 0.
- If the maximum expression value across all genes is less than 20, a power transformation (raising values to the power of 2) is applied.
- Only genes common to both the marker gene list and the mixture dataset are considered for deconvolution.
- Standardization is performed column-wise (i.e., per cell type) by subtracting the minimum value and dividing by the mean.

In [6]:
pure, mix = preprocess_only_marker(pure_file_path, mix_file_path)

## 4 - No Need to Run Gene Differentiation Algorithm.

In [7]:
pure.head()

Unnamed: 0,Adipocytes,Astrocytes,B-cells,Basophils,CD4+ T-cells,CD4+ Tcm,CD4+ Tem,CD4+ memory T-cells,CD4+ naive T-cells,CD8+ T-cells,CD8+ Tcm,CD8+ Tem,CD8+ naive T-cells,CLP,CMP,Chondrocytes,DC,Endothelial cells,Eosinophils,Epithelial cells,Erythrocytes,Fibroblasts,GMP,HSC,Hepatocytes,Keratinocytes,MEP,MPP,MSC,Macrophages,Macrophages M1,Macrophages M2,Mast cells,Megakaryocytes,Melanocytes,Memory B-cells,Mesangial cells,Monocytes,Myocytes,NK cells,NKT,Neurons,Neutrophils,Osteoblast,Pericytes,Plasma cells,Platelets,Preadipocytes,Sebocytes,Skeletal muscle,Smooth muscle,Tgd cells,Th1 cells,Th2 cells,Tregs,ly Endothelial cells,mv Endothelial cells,naive B-cells,pro B-cells
0,adh1b,acta2,tnfrsf17,ceacam8,bad,cd5,tnfsf8,cd3g,cd2,cd8a,cd8b,abcd2,cd8a,cox6c,azu1,adra1d,flt3,acvrl1,agtr2,nqo1,cenpa,arf4,ceacam8,cd34,aadac,adam8,bub1b,abo,htr7,acadvl,acp2,acp2,,arhgap6,abl2,art1,cdh6,asgr2,evc,gzmh,phkg1,abca3,clc,arcn1,atp5j,alpi,adcy8,adh5,,,copa,abcd2,ifng,gpr15,ccr4,flt4,acvrl1,rere,azu1
1,dlat,cnn1,cd19,clk1,apbb1,cd40lg,cd2,aamp,cd3g,apbb1,abcd2,slc25a20,,calm1,adss,,ache,angpt2,ccr3,flnb,alas2,adh5,clc,alas2,acadl,,fxn,adcy3,dctd,acp2,abcd1,adra2b,anxa1,anxa3,,blk,acta2,ap1g1,cdh15,il2rb,ambn,aldoc,bmx,bmpr1a,adcy3,tnfrsf17,alox12,arcn1,entpd3,ache,ccng1,,chd4,il5,ctla4,gja4,actg1,cd1a,adarb2
2,,cbr3,actn2,actn2,krit1,bmpr1a,rpn2,adsl,cd4,cd3d,bmpr1a,abcf1,gpr15,dntt,tor1a,arl1,cd1b,tie1,c3ar1,adm,aplnr,bmpr1a,atp5j,crhbp,abat,dsg3,atic,avp,atp6v1c1,atox1,adra2b,clcn7,anxa11,,acacb,tnfrsf17,cd70,aif1,,faslg,casp5,epha3,ca4,rhoa,col10a1,,,adcyap1r1,csf2,cav3,adh1b,bub1,cox10,gzmk,cd5,angpt2,adra1b,cxcr5,
3,,,blk,scgb2a2,abcd2,ccr4,araf,cd6,cd3e,casp8,adcyap1r1,faslg,ccr8,igll1,alox15,ccnb1,alcam,bmx,adora3,bik,epb42,adh1b,aplnr,,,bdkrb2,ahcy,amd1,col10a1,arsb,,fgr,atp6v1c1,,dct,acrv1,cdkn1c,csnk1a1,alpl,gzmb,rara,,apaf1,,flt1,bmp8b,apoa1,arhgap6,,,add1,,cstf1,gzma,ccr3,acvrl1,cetp,blk,blk
4,slc25a6,col11a1,btk,fcn1,cd5,adsl,aire,,ccr7,,cd8a,dhx8,krt1,h3f3b,ms4a3,comp,c1qa,,alox15,f3,gata1,add1,arhgap6,crygd,acads,,ca1,azu1,cyc1,,alcam,dnase1l3,,,mlana,,,abcb7,copb1,bad,,,ceacam3,,slc31a1,avp,,fgf7,gjb5,actn2,cdk4,cd2,,,cd28,,angpt2,bmp3,arg1


Please note that the use of lowercase letters and empty spots in the table is recommended but not mandatory, as YADA can handle them. Additionally, the presence of empty spaces is not a concern. It is advisable to ensure that gene names are unique within each column.

YADA does not require the entire pure reference gene expression matrix; it only needs the marker gene list for each cell type, as demonstrated in the previous table. While a complete reference table can be used to deduce this information using the run_gene_diff function, it's important to note that in most cases, only marker gene lists are available.

## 5 - Run Deconvolution.

In [8]:
result = run_dtw_deconv_ensemble(pure, mix)
result

#Download Result.
#FileLink('data/results.csv')
#from google.colab import files
#files.download('data/results.csv') 

  0%|          | 0/400 [00:00<?, ?it/s]

Unnamed: 0,Adipocytes,Astrocytes,B-cells,Basophils,CD4+ T-cells,CD4+ Tcm,CD4+ Tem,CD4+ memory T-cells,CD4+ naive T-cells,CD8+ T-cells,CD8+ Tcm,CD8+ Tem,CD8+ naive T-cells,CLP,CMP,Chondrocytes,DC,Endothelial cells,Eosinophils,Epithelial cells,Erythrocytes,Fibroblasts,GMP,HSC,Hepatocytes,Keratinocytes,MEP,MPP,MSC,Macrophages,Macrophages M1,Macrophages M2,Mast cells,Megakaryocytes,Melanocytes,Memory B-cells,Mesangial cells,Monocytes,Myocytes,NK cells,NKT,Neurons,Neutrophils,Osteoblast,Pericytes,Plasma cells,Platelets,Preadipocytes,Sebocytes,Skeletal muscle,Smooth muscle,Tgd cells,Th1 cells,Th2 cells,Tregs,ly Endothelial cells,mv Endothelial cells,naive B-cells,pro B-cells
SUB134264,0.007789,0.008487,0.038599,0.026411,0.023513,0.010699,0.008411,0.034101,0.007389,0.021843,0.023050,0.008415,0.033004,0.015648,0.029699,0.041060,0.005295,0.010944,0.002313,0.006778,0.010750,0.024395,0.008448,0.004709,0.000130,0.011417,0.008915,0.029538,0.008282,0.021483,0.031123,0.007266,0.011347,0.030473,0.016076,0.037913,0.020523,0.041282,0.003360,0.011911,0.004526,0.003439,0.057275,0.013794,0.009683,0.022679,0.003035,0.034622,0.049356,0.012666,0.016717,0.000607,0.006156,0.012566,0.002004,0.007278,0.009540,0.028158,0.010827
SUB134282,0.008963,0.015398,0.033371,0.019283,0.018325,0.005654,0.004579,0.023147,0.007252,0.011464,0.013439,0.010462,0.014471,0.018009,0.017931,0.039991,0.006662,0.007955,0.009896,0.013068,0.013558,0.039235,0.005447,0.006293,0.000113,0.021560,0.010528,0.033330,0.006921,0.026654,0.026554,0.009481,0.013161,0.025215,0.025576,0.049717,0.038779,0.054211,0.004440,0.015458,0.004245,0.006292,0.073768,0.011343,0.009243,0.017408,0.001702,0.049113,0.037270,0.013164,0.007833,0.000802,0.005858,0.018361,0.001939,0.009138,0.006997,0.032986,0.032749
SUB134283,0.004301,0.011055,0.044704,0.025911,0.009799,0.002570,0.005848,0.011183,0.000571,0.003194,0.004400,0.003882,0.013882,0.016410,0.023425,0.034322,0.007898,0.010417,0.008173,0.007109,0.015961,0.038745,0.008059,0.007685,0.000101,0.013800,0.017189,0.021927,0.004675,0.024567,0.019327,0.006417,0.008630,0.035086,0.007782,0.070076,0.032179,0.044592,0.002566,0.004978,0.005269,0.001923,0.066494,0.008691,0.004569,0.015295,0.005288,0.029119,0.039560,0.008440,0.004897,0.000197,0.007797,0.006792,0.000955,0.007115,0.004510,0.029535,0.015038
SUB134259,0.010103,0.018422,0.031011,0.018846,0.047928,0.014650,0.011470,0.047239,0.010905,0.039095,0.039758,0.022233,0.071186,0.017752,0.016114,0.032947,0.005255,0.021000,0.000910,0.002268,0.007639,0.018702,0.010431,0.003688,0.000050,0.005828,0.005268,0.013600,0.011611,0.014227,0.021470,0.009546,0.002713,0.009682,0.016542,0.016255,0.030161,0.016677,0.003833,0.026174,0.003023,0.003400,0.009277,0.010849,0.007650,0.026630,0.000089,0.017954,0.028258,0.009786,0.018701,0.001511,0.009447,0.044811,0.003342,0.003490,0.012971,0.017255,0.003323
SUB134285,0.014237,0.014924,0.018411,0.024549,0.023491,0.009200,0.008851,0.041724,0.008406,0.015011,0.023076,0.010460,0.048243,0.015980,0.026461,0.025370,0.005882,0.014313,0.007565,0.004317,0.017141,0.034478,0.011332,0.008039,0.000097,0.016820,0.015759,0.026647,0.008097,0.019915,0.018528,0.008998,0.009843,0.012029,0.022987,0.026851,0.038935,0.040962,0.002043,0.008710,0.005293,0.003798,0.056188,0.012030,0.008885,0.024851,0.002795,0.023895,0.056046,0.011417,0.012078,0.000970,0.008340,0.017052,0.002130,0.005274,0.009899,0.017308,0.015933
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SUB134309.1,0.014216,0.007286,0.024262,0.013450,0.031678,0.014825,0.008218,0.039860,0.013936,0.025003,0.034967,0.010630,0.063955,0.024662,0.016918,0.019307,0.004036,0.027635,0.005831,0.006731,0.009326,0.020500,0.008126,0.002779,0.000011,0.011731,0.011486,0.019831,0.010243,0.011972,0.016192,0.005829,0.000911,0.003310,0.008853,0.040463,0.017014,0.011789,0.002612,0.010888,0.001684,0.003755,0.026642,0.006770,0.009349,0.036205,0.001265,0.021737,0.059437,0.008751,0.019143,0.001142,0.010231,0.024954,0.001559,0.004652,0.012641,0.037665,0.003239
SUB134308.1,0.006255,0.010363,0.045047,0.015623,0.020540,0.010027,0.004778,0.040223,0.012027,0.010803,0.021542,0.008229,0.055790,0.019037,0.031791,0.013493,0.006429,0.026994,0.003065,0.006761,0.013827,0.018316,0.003652,0.006186,0.000114,0.017234,0.013118,0.025796,0.014864,0.013907,0.021191,0.003611,0.006259,0.015269,0.017071,0.054146,0.026643,0.022134,0.002229,0.005813,0.004423,0.006872,0.040521,0.009560,0.009177,0.027252,0.002247,0.019789,0.070588,0.007859,0.015769,0.001037,0.008223,0.019914,0.003068,0.007492,0.014809,0.050048,0.012661
SUB134296.1,0.007005,0.013132,0.027097,0.020838,0.012344,0.004882,0.000929,0.021553,0.006459,0.005290,0.013234,0.007212,0.034612,0.016552,0.039904,0.021014,0.006674,0.014262,0.005045,0.006376,0.018469,0.031153,0.007142,0.006480,0.000039,0.018078,0.029237,0.031312,0.006163,0.019304,0.022383,0.005143,0.011721,0.013737,0.014341,0.029838,0.038733,0.035979,0.002867,0.004662,0.006035,0.004731,0.069248,0.009014,0.006696,0.016382,0.001142,0.020416,0.077906,0.012258,0.007225,0.000413,0.005908,0.010027,0.001078,0.007326,0.008113,0.016884,0.011497
SUB134295.1,0.017754,0.007285,0.034306,0.012825,0.018251,0.009921,0.006734,0.036271,0.012445,0.009021,0.018937,0.009144,0.057213,0.031513,0.018306,0.024996,0.006651,0.023368,0.004937,0.006377,0.010403,0.018918,0.005402,0.004366,0.000049,0.015647,0.012656,0.021313,0.013456,0.013803,0.023035,0.004805,0.012367,0.015932,0.010310,0.027909,0.032844,0.033660,0.001663,0.009094,0.004218,0.004589,0.052563,0.006823,0.010565,0.041643,0.003506,0.019433,0.021446,0.012357,0.022549,0.001121,0.011425,0.017101,0.001701,0.009354,0.014397,0.020906,0.014033


## 5 - Evaluate Results.

In case true proportions are available.

In [11]:
res = calc_corr(labels_file_path, result) #, columns=['dataset', 'celltype', 'pearson', 'spearman', 'p'])
res

Unnamed: 0,xCell,celltype,Pearson,Spearman,p
0,xCell,B-cells,0.304382,0.278301,0.02720637
1,xCell,CD4+ T-cells,0.317562,0.297335,0.01795447
2,xCell,CD8+ T-cells,0.712214,0.693692,2.92716e-10
3,xCell,CD4+ Tem,-0.02539,-0.044431,0.7295157
4,xCell,CD8+ Tem,0.360605,0.385791,0.001792348
5,xCell,Tgd cells,0.019891,0.246717,0.05126046
6,xCell,Memory B-cells,0.318338,0.37431,0.002509527
7,xCell,Monocytes,0.389303,0.426824,0.000485807
8,xCell,naive B-cells,0.470334,0.614917,8.240545e-08
9,xCell,CD4+ naive T-cells,0.275138,0.283414,0.02439523
