# YADA Deconvolution

---

In this notebook, we demonstrate running YADA with a matrix that includes only lists of marker genes, i.e., without RNA counts for relevant cell types.
It is recommended to clone this repository by using:
```
git clone https://github.com/zurkin1/Yada.git
!pip install -r ../requirements.txt
```
and then run it in a Jupyter notebook.

## 1 - Import Prerequisites

In [1]:
%load_ext autoreload
%autoreload 2


from YADA import *

pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 10000)

## 2 - Configure Input Files

Example input files are located in the `./data/` folder. We demonstrate using input files from the xCell deconvolution method.


In [2]:
#Marker gene list.
pure_file_path = '../data/xCell/pure.csv'

#This is the mixture file in the format: columns: mix1, mix2, ..., rows: gene names.
mix_file_path = '../data/xCell/mix.csv'

#True cell type proportions file. This is not mandatory, but if you have it, you can use it to compare the results.
labels_file_path = 'xCell'

## 3 - Data Preprocessing
YADA implements the following data preprocessing steps:

- Missing values are imputed with 0.
- If the maximum expression value across all genes is less than 20, a power transformation (raising values to the power of 2) is applied.
- Only genes common to both the marker gene list and the mixture dataset are considered for deconvolution.
- Standardization is performed column-wise (i.e., per cell type) by subtracting the minimum value and dividing by the mean.

In [3]:
pure, mix = preprocess_files(pure_file_path, mix_file_path)

In [11]:
# An example of a valid pure marker gene list file.
pure.head()

Unnamed: 0,Adipocytes,Astrocytes,B-cells,Basophils,CD4+ T-cells,CD4+ Tcm,CD4+ Tem,CD4+ memory T-cells,CD4+ naive T-cells,CD8+ T-cells,CD8+ Tcm,CD8+ Tem,CD8+ naive T-cells,CLP,CMP,Chondrocytes,DC,Endothelial cells,Eosinophils,Epithelial cells,Erythrocytes,Fibroblasts,GMP,HSC,Hepatocytes,Keratinocytes,MEP,MPP,MSC,Macrophages,Macrophages M1,Macrophages M2,Mast cells,Megakaryocytes,Melanocytes,Memory B-cells,Mesangial cells,Monocytes,Myocytes,NK cells,NKT,Neurons,Neutrophils,Osteoblast,Pericytes,Plasma cells,Platelets,Preadipocytes,Sebocytes,Skeletal muscle,Smooth muscle,Tgd cells,Th1 cells,Th2 cells,Tregs,ly Endothelial cells,mv Endothelial cells,naive B-cells,pro B-cells
0,adh1b,acta2,tnfrsf17,ceacam8,bad,cd5,tnfsf8,cd3g,cd2,cd8a,cd8b,abcd2,cd8a,cox6c,azu1,adra1d,flt3,acvrl1,agtr2,nqo1,cenpa,arf4,ceacam8,cd34,aadac,adam8,bub1b,abo,htr7,acadvl,acp2,acp2,,arhgap6,abl2,art1,cdh6,asgr2,evc,gzmh,phkg1,abca3,clc,arcn1,atp5j,alpi,adcy8,adh5,,,copa,abcd2,ifng,gpr15,ccr4,flt4,acvrl1,rere,azu1
1,dlat,cnn1,cd19,clk1,apbb1,cd40lg,cd2,aamp,cd3g,apbb1,abcd2,slc25a20,,calm1,adss,,ache,angpt2,ccr3,flnb,alas2,adh5,clc,alas2,acadl,,fxn,adcy3,dctd,acp2,abcd1,adra2b,anxa1,anxa3,,blk,acta2,ap1g1,cdh15,il2rb,ambn,aldoc,bmx,bmpr1a,adcy3,tnfrsf17,alox12,arcn1,entpd3,ache,ccng1,,chd4,il5,ctla4,gja4,actg1,cd1a,adarb2
2,,cbr3,actn2,actn2,krit1,bmpr1a,rpn2,adsl,cd4,cd3d,bmpr1a,abcf1,gpr15,dntt,tor1a,arl1,cd1b,tie1,c3ar1,adm,aplnr,bmpr1a,atp5j,crhbp,abat,dsg3,atic,avp,atp6v1c1,atox1,adra2b,clcn7,anxa11,,acacb,tnfrsf17,cd70,aif1,,faslg,casp5,epha3,ca4,rhoa,col10a1,,,adcyap1r1,csf2,cav3,adh1b,bub1,cox10,gzmk,cd5,angpt2,adra1b,cxcr5,
3,,,blk,scgb2a2,abcd2,ccr4,araf,cd6,cd3e,casp8,adcyap1r1,faslg,ccr8,igll1,alox15,ccnb1,alcam,bmx,adora3,bik,epb42,adh1b,aplnr,,,bdkrb2,ahcy,amd1,col10a1,arsb,,fgr,atp6v1c1,,dct,acrv1,cdkn1c,csnk1a1,alpl,gzmb,rara,,apaf1,,flt1,bmp8b,apoa1,arhgap6,,,add1,,cstf1,gzma,ccr3,acvrl1,cetp,blk,blk
4,slc25a6,col11a1,btk,fcn1,cd5,adsl,aire,,ccr7,,cd8a,dhx8,krt1,h3f3b,ms4a3,comp,c1qa,,alox15,f3,gata1,add1,arhgap6,crygd,acads,,ca1,azu1,cyc1,,alcam,dnase1l3,,,mlana,,,abcb7,copb1,bad,,,ceacam3,,slc31a1,avp,,fgf7,gjb5,actn2,cdk4,cd2,,,cd28,,angpt2,bmp3,arg1


Please note that the use of lowercase letters and empty spots in the table is recommended but not mandatory, as YADA can handle them. Additionally, the presence of empty spaces is not a concern. It is advisable to ensure that gene names are unique within each column.

## 4 - Run YADA Deconvolution

In [5]:
result = run_yada(pure, mix)
result

  0%|          | 0/400 [00:00<?, ?it/s]

Unnamed: 0,Adipocytes,Astrocytes,B-cells,Basophils,CD4+ T-cells,CD4+ Tcm,CD4+ Tem,CD4+ memory T-cells,CD4+ naive T-cells,CD8+ T-cells,CD8+ Tcm,CD8+ Tem,CD8+ naive T-cells,CLP,CMP,Chondrocytes,DC,Endothelial cells,Eosinophils,Epithelial cells,Erythrocytes,Fibroblasts,GMP,HSC,Hepatocytes,Keratinocytes,MEP,MPP,MSC,Macrophages,Macrophages M1,Macrophages M2,Mast cells,Megakaryocytes,Melanocytes,Memory B-cells,Mesangial cells,Monocytes,Myocytes,NK cells,NKT,Neurons,Neutrophils,Osteoblast,Pericytes,Plasma cells,Platelets,Preadipocytes,Sebocytes,Skeletal muscle,Smooth muscle,Tgd cells,Th1 cells,Th2 cells,Tregs,ly Endothelial cells,mv Endothelial cells,naive B-cells,pro B-cells
SUB134264,0.008675,0.008326,0.039983,0.026444,0.023705,0.009125,0.011993,0.034954,0.007035,0.022181,0.025292,0.007354,0.033974,0.016178,0.031748,0.048181,0.005231,0.011596,0.002359,0.006872,0.010937,0.023893,0.008225,0.004340,0.000371,0.010849,0.007205,0.026724,0.008928,0.022964,0.029166,0.008148,0.011670,0.030560,0.016198,0.036689,0.018802,0.037317,0.003220,0.010555,0.004236,0.003212,0.057848,0.014893,0.009799,0.020054,0.002943,0.034374,0.049670,0.012623,0.017020,0.000770,0.006712,0.011947,0.002491,0.008839,0.008670,0.026514,0.010098
SUB134282,0.010076,0.016535,0.035105,0.019306,0.018462,0.004697,0.007360,0.022263,0.006854,0.011037,0.014596,0.009404,0.014896,0.018304,0.018516,0.046756,0.006464,0.008328,0.010922,0.012972,0.014435,0.037622,0.005372,0.005947,0.000321,0.019275,0.008668,0.028770,0.007533,0.028934,0.025260,0.011223,0.013536,0.025288,0.027412,0.048095,0.034902,0.051550,0.004402,0.013967,0.003924,0.005663,0.076498,0.012389,0.009162,0.014672,0.001650,0.051442,0.037506,0.012944,0.007948,0.001017,0.006027,0.017453,0.002512,0.011803,0.006398,0.031181,0.030794
SUB134283,0.004831,0.011782,0.049674,0.025246,0.009873,0.002162,0.008666,0.012301,0.000579,0.003189,0.004984,0.004019,0.014290,0.016776,0.024843,0.041768,0.007277,0.011199,0.009328,0.006869,0.017331,0.035469,0.007886,0.007257,0.000289,0.012783,0.014342,0.019955,0.004915,0.026086,0.018520,0.007064,0.008876,0.035186,0.008744,0.066448,0.029289,0.042785,0.002425,0.003787,0.004757,0.001853,0.067409,0.009489,0.004313,0.011868,0.005128,0.029361,0.039812,0.008412,0.005199,0.000250,0.008037,0.006638,0.001257,0.008629,0.004073,0.028365,0.013645
SUB134259,0.011360,0.019919,0.033740,0.018866,0.048925,0.011618,0.014585,0.047619,0.010226,0.039656,0.040007,0.018077,0.073277,0.018664,0.016509,0.038255,0.005032,0.020438,0.001057,0.002072,0.007705,0.019896,0.010185,0.003585,0.000143,0.004534,0.003901,0.011290,0.012581,0.014710,0.021862,0.011187,0.002791,0.009710,0.015857,0.016120,0.026102,0.017001,0.003290,0.025728,0.002872,0.003139,0.009533,0.011681,0.007204,0.022179,0.000086,0.016409,0.028438,0.009524,0.019133,0.001915,0.009936,0.042040,0.004996,0.003949,0.011735,0.015747,0.003500
SUB134285,0.015485,0.016781,0.019630,0.023296,0.024045,0.007967,0.011904,0.041483,0.007965,0.015360,0.025854,0.009334,0.049661,0.015959,0.028401,0.030436,0.005572,0.014589,0.007664,0.003875,0.018264,0.032413,0.010979,0.007788,0.000276,0.016267,0.012748,0.024862,0.009177,0.020706,0.017326,0.009842,0.010124,0.012063,0.024989,0.026203,0.035201,0.038563,0.001817,0.008282,0.005365,0.003532,0.057321,0.012865,0.008170,0.021610,0.002710,0.023905,0.056401,0.011366,0.012983,0.001230,0.008475,0.015963,0.002426,0.006214,0.008835,0.016540,0.015808
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SUB134309.1,0.015569,0.007028,0.027848,0.011883,0.032224,0.012127,0.011816,0.041276,0.013235,0.025777,0.038827,0.009073,0.065834,0.024641,0.017682,0.022060,0.004227,0.025917,0.005993,0.006308,0.010117,0.021045,0.007805,0.002669,0.000032,0.011063,0.008980,0.017255,0.011129,0.013014,0.016768,0.007261,0.000937,0.003319,0.008726,0.038624,0.016247,0.010509,0.002542,0.010619,0.001590,0.003769,0.027084,0.006931,0.009161,0.031519,0.001227,0.020963,0.059814,0.008507,0.019550,0.001448,0.010824,0.023292,0.001967,0.006199,0.011187,0.034570,0.003213
SUB134308.1,0.006957,0.012288,0.047892,0.015038,0.020555,0.008144,0.008281,0.039561,0.011618,0.011324,0.023731,0.007921,0.057430,0.018027,0.035853,0.015489,0.006233,0.026878,0.003466,0.005986,0.014337,0.018896,0.003477,0.005924,0.000327,0.016619,0.010820,0.020740,0.016419,0.014941,0.021003,0.004169,0.006437,0.015312,0.018906,0.052654,0.023796,0.020316,0.002114,0.005134,0.004522,0.006615,0.041464,0.010267,0.009644,0.022956,0.002179,0.019327,0.071037,0.007226,0.016245,0.001315,0.008735,0.019036,0.004575,0.008830,0.013164,0.046711,0.013002
SUB134296.1,0.007484,0.014159,0.027445,0.019559,0.012304,0.004052,0.001568,0.021284,0.006206,0.005287,0.014821,0.007529,0.035629,0.016306,0.044777,0.025288,0.006301,0.014602,0.005064,0.006083,0.020038,0.030427,0.006732,0.006213,0.000110,0.017215,0.024400,0.026365,0.006463,0.020615,0.021939,0.006202,0.012055,0.013776,0.015249,0.028892,0.035157,0.033400,0.002823,0.003767,0.005577,0.004242,0.070313,0.009504,0.007363,0.014311,0.001107,0.020940,0.078400,0.012094,0.007514,0.000523,0.005925,0.009413,0.001530,0.009759,0.007121,0.015789,0.011515
SUB134295.1,0.019656,0.007328,0.037481,0.013006,0.018012,0.008093,0.010485,0.035141,0.011967,0.009458,0.020832,0.008743,0.058895,0.030654,0.018969,0.029606,0.006406,0.023143,0.005003,0.006561,0.010514,0.018746,0.005331,0.004116,0.000139,0.015010,0.010584,0.017671,0.014671,0.015090,0.023631,0.005790,0.012720,0.015977,0.011039,0.026901,0.029765,0.032976,0.001585,0.008334,0.004128,0.004440,0.053328,0.007392,0.009647,0.036044,0.003400,0.020701,0.021583,0.012001,0.023634,0.001421,0.012414,0.016439,0.002165,0.010959,0.012522,0.019071,0.013893


## 5 - Evaluate Results

In case true proportions are available.

In [23]:
res = calc_corr(labels_file_path, result)
res

Unnamed: 0,xCell,celltype,Pearson,Spearman,p
0,xCell,B-cells,0.295542,0.256027,0.04283091
1,xCell,CD4+ T-cells,0.318565,0.299927,0.01693168
2,xCell,CD8+ T-cells,0.729928,0.695949,2.425657e-10
3,xCell,CD4+ Tem,-0.062578,-0.082837,0.5186391
4,xCell,CD8+ Tem,0.34959,0.369229,0.002901571
5,xCell,Tgd cells,0.019891,0.246717,0.05126046
6,xCell,Memory B-cells,0.307397,0.370205,0.002822278
7,xCell,Monocytes,0.387703,0.405919,0.0009644279
8,xCell,naive B-cells,0.478908,0.607549,1.292082e-07
9,xCell,CD4+ naive T-cells,0.296888,0.303311,0.01567174
