# Yada Deconvolution

---

In this notebook, we demonstrate running YADA with a matrix that includes only lists of marker genes, i.e., without RNA counts for relevant cell types.
It is recommended to clone this repository by using:
!git clone https://github.com/zurkin1/Yada.git
and then run it using Jupyter notebook.

## 1 - Import Prerequisites.

In [1]:
%load_ext autoreload
%autoreload 2

from IPython.display import FileLink, FileLinks
import pandas as pd
from YADA import *

pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 10000)

## 2 - Configure Input Files.

Example input files are located in the "./data/" folder. We demonstrate using input files from the xCell deconvolution method.


In [2]:
#Marker gene list.
pure = '../data/xCell/pure.csv'

#This is the mixture file in the format: columns: mix1, mix2, ..., rows: gene names.
mix = '../data/xCell/mix.csv'

#True cell type proportions file.
labels = 'xCell'

## 3 - Preprocess Data.
YADA preprocessing involves the following steps:
- Filling missing values with 0.
- If the maximum value of all genes is less than 20, we raise all values to the power of two.
- We consider only genes that are common to both marker gene list and mix datasets.
- Standardization is performed by column (i.e., by cell type) by subtracting the minimum value and dividing by the mean.

In [3]:
pure, mix = preprocess_only_marker('../data/xCell/pure.csv', '../data/xCell/mix.csv')

## 4 - No Need to Run Gene Differentiation Algorithm.

In [4]:
pure.head()

Unnamed: 0,Adipocytes,Astrocytes,B-cells,Basophils,CD4+ T-cells,CD4+ Tcm,CD4+ Tem,CD4+ memory T-cells,CD4+ naive T-cells,CD8+ T-cells,CD8+ Tcm,CD8+ Tem,CD8+ naive T-cells,CLP,CMP,Chondrocytes,DC,Endothelial cells,Eosinophils,Epithelial cells,Erythrocytes,Fibroblasts,GMP,HSC,Hepatocytes,Keratinocytes,MEP,MPP,MSC,Macrophages,Macrophages M1,Macrophages M2,Mast cells,Megakaryocytes,Melanocytes,Memory B-cells,Mesangial cells,Monocytes,Myocytes,NK cells,NKT,Neurons,Neutrophils,Osteoblast,Pericytes,Plasma cells,Platelets,Preadipocytes,Sebocytes,Skeletal muscle,Smooth muscle,Tgd cells,Th1 cells,Th2 cells,Tregs,ly Endothelial cells,mv Endothelial cells,naive B-cells,pro B-cells
0,adh1b,acta2,tnfrsf17,ceacam8,bad,cd5,tnfsf8,cd3g,cd2,cd8a,cd8b,abcd2,cd8a,cox6c,azu1,adra1d,flt3,acvrl1,agtr2,nqo1,cenpa,arf4,ceacam8,cd34,aadac,adam8,bub1b,abo,htr7,acadvl,acp2,acp2,,arhgap6,abl2,art1,cdh6,asgr2,evc,gzmh,phkg1,abca3,clc,arcn1,atp5j,alpi,adcy8,adh5,,,copa,abcd2,ifng,gpr15,ccr4,flt4,acvrl1,rere,azu1
1,dlat,cnn1,cd19,clk1,apbb1,cd40lg,cd2,aamp,cd3g,apbb1,abcd2,slc25a20,,calm1,adss,,ache,angpt2,ccr3,flnb,alas2,adh5,clc,alas2,acadl,,fxn,adcy3,dctd,acp2,abcd1,adra2b,anxa1,anxa3,,blk,acta2,ap1g1,cdh15,il2rb,ambn,aldoc,bmx,bmpr1a,adcy3,tnfrsf17,alox12,arcn1,entpd3,ache,ccng1,,chd4,il5,ctla4,gja4,actg1,cd1a,adarb2
2,,cbr3,actn2,actn2,krit1,bmpr1a,rpn2,adsl,cd4,cd3d,bmpr1a,abcf1,gpr15,dntt,tor1a,arl1,cd1b,tie1,c3ar1,adm,aplnr,bmpr1a,atp5j,crhbp,abat,dsg3,atic,avp,atp6v1c1,atox1,adra2b,clcn7,anxa11,,acacb,tnfrsf17,cd70,aif1,,faslg,casp5,epha3,ca4,rhoa,col10a1,,,adcyap1r1,csf2,cav3,adh1b,bub1,cox10,gzmk,cd5,angpt2,adra1b,cxcr5,
3,,,blk,scgb2a2,abcd2,ccr4,araf,cd6,cd3e,casp8,adcyap1r1,faslg,ccr8,igll1,alox15,ccnb1,alcam,bmx,adora3,bik,epb42,adh1b,aplnr,,,bdkrb2,ahcy,amd1,col10a1,arsb,,fgr,atp6v1c1,,dct,acrv1,cdkn1c,csnk1a1,alpl,gzmb,rara,,apaf1,,flt1,bmp8b,apoa1,arhgap6,,,add1,,cstf1,gzma,ccr3,acvrl1,cetp,blk,blk
4,slc25a6,col11a1,btk,fcn1,cd5,adsl,aire,,ccr7,,cd8a,dhx8,krt1,h3f3b,ms4a3,comp,c1qa,,alox15,f3,gata1,add1,arhgap6,crygd,acads,,ca1,azu1,cyc1,,alcam,dnase1l3,,,mlana,,,abcb7,copb1,bad,,,ceacam3,,slc31a1,avp,,fgf7,gjb5,actn2,cdk4,cd2,,,cd28,,angpt2,bmp3,arg1


Please note that the use of lowercase letters and empty spots in the table is recommended but not mandatory, as YADA can handle them. Additionally, the presence of empty spaces is not a concern. It is advisable to ensure that gene names are unique within each column.

In [5]:
gene_list_df = pure

YADA does not require the entire pure reference gene expression matrix; it only needs the marker gene list for each cell type, as demonstrated in the previous table. While a complete reference table can be used to deduce this information using the run_gene_diff function, it's important to note that in most cases, only marker gene lists are available.

## 5 - Run Deconvolution.

In [7]:
result = run_dtw_deconv_ensemble(pure, mix, gene_list_df)
result

#Download Result.
#FileLink('data/results.csv')
#from google.colab import files
#files.download('data/results.csv') 

  0%|          | 0/400 [00:00<?, ?it/s]

Unnamed: 0,Adipocytes,Astrocytes,B-cells,Basophils,CD4+ T-cells,CD4+ Tcm,CD4+ Tem,CD4+ memory T-cells,CD4+ naive T-cells,CD8+ T-cells,CD8+ Tcm,CD8+ Tem,CD8+ naive T-cells,CLP,CMP,Chondrocytes,DC,Endothelial cells,Eosinophils,Epithelial cells,Erythrocytes,Fibroblasts,GMP,HSC,Hepatocytes,Keratinocytes,MEP,MPP,MSC,Macrophages,Macrophages M1,Macrophages M2,Mast cells,Megakaryocytes,Melanocytes,Memory B-cells,Mesangial cells,Monocytes,Myocytes,NK cells,NKT,Neurons,Neutrophils,Osteoblast,Pericytes,Plasma cells,Platelets,Preadipocytes,Sebocytes,Skeletal muscle,Smooth muscle,Tgd cells,Th1 cells,Th2 cells,Tregs,ly Endothelial cells,mv Endothelial cells,naive B-cells,pro B-cells
SUB134264,0.010743,0.007973,0.039565,0.028794,0.024682,0.011269,0.009332,0.031918,0.007105,0.022543,0.021900,0.008414,0.031110,0.016125,0.031054,0.044359,0.004681,0.011775,0.001856,0.005430,0.011370,0.024191,0.008743,0.004481,0.000177,0.011120,0.008385,0.032281,0.009015,0.022962,0.027795,0.006811,0.011593,0.029178,0.013700,0.037884,0.020325,0.041325,0.003896,0.011416,0.005168,0.003953,0.057601,0.012607,0.009142,0.022960,0.002982,0.035042,0.050361,0.011959,0.016727,0.000710,0.006566,0.012723,0.002404,0.009359,0.007384,0.025917,0.009851
SUB134282,0.012237,0.015493,0.034592,0.020334,0.018441,0.005822,0.005776,0.020395,0.007044,0.011910,0.012263,0.010605,0.013641,0.018461,0.018967,0.042812,0.005641,0.009086,0.008704,0.010482,0.014505,0.038109,0.005627,0.005807,0.000154,0.019756,0.010040,0.034470,0.007618,0.028943,0.024056,0.009006,0.013446,0.024144,0.022891,0.049437,0.037772,0.055059,0.005177,0.014293,0.004956,0.007405,0.073633,0.010624,0.008475,0.017237,0.001672,0.054173,0.038028,0.012706,0.007647,0.000939,0.006349,0.018449,0.002470,0.011467,0.005150,0.030197,0.028715
SUB134283,0.005849,0.011022,0.049877,0.027283,0.009689,0.002943,0.006785,0.010481,0.000513,0.002991,0.004000,0.004327,0.013086,0.017152,0.025283,0.038569,0.007055,0.011032,0.007242,0.005831,0.016885,0.035190,0.007868,0.007124,0.000138,0.013199,0.016711,0.022739,0.005348,0.026227,0.017291,0.005695,0.008817,0.033596,0.007068,0.068666,0.031540,0.044605,0.003082,0.004414,0.005838,0.002305,0.065170,0.007928,0.003773,0.015016,0.005196,0.029648,0.040366,0.008071,0.005110,0.000231,0.008035,0.007148,0.001065,0.009331,0.003232,0.027054,0.013302
SUB134259,0.014153,0.018963,0.033407,0.020599,0.051577,0.014278,0.010830,0.045223,0.010402,0.041516,0.037893,0.021499,0.067100,0.018894,0.016706,0.035084,0.004531,0.022432,0.000819,0.001771,0.008347,0.020573,0.010871,0.003177,0.000068,0.005269,0.004722,0.012609,0.012644,0.014874,0.019280,0.009870,0.002772,0.009271,0.014029,0.016180,0.028186,0.017520,0.004215,0.025635,0.003394,0.004119,0.009061,0.009919,0.006629,0.025002,0.000087,0.015405,0.028833,0.009111,0.018650,0.001767,0.010371,0.044489,0.005018,0.004039,0.010102,0.015292,0.003364
SUB134285,0.019483,0.015069,0.019214,0.024805,0.025418,0.009832,0.009425,0.039228,0.008090,0.015891,0.022163,0.010643,0.045475,0.017490,0.028649,0.028999,0.005085,0.015289,0.006447,0.003484,0.018271,0.032182,0.011614,0.006967,0.000132,0.016341,0.014936,0.027892,0.009351,0.020422,0.016415,0.008749,0.010057,0.011518,0.020728,0.026045,0.038149,0.040978,0.002335,0.008338,0.006388,0.004295,0.056472,0.010962,0.007917,0.023874,0.002746,0.024502,0.057186,0.010866,0.012742,0.001135,0.008668,0.017603,0.002554,0.007101,0.007788,0.015394,0.014299
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SUB134309.1,0.020052,0.006988,0.027038,0.013028,0.033955,0.015116,0.008877,0.038037,0.013539,0.026381,0.034191,0.010629,0.060284,0.026209,0.017750,0.020697,0.003231,0.030728,0.005091,0.005485,0.009511,0.021067,0.008384,0.002477,0.000015,0.010549,0.010638,0.019186,0.010535,0.013130,0.014825,0.005777,0.000931,0.003169,0.007045,0.040378,0.017630,0.011161,0.003095,0.011118,0.002029,0.004903,0.026824,0.005960,0.008854,0.034194,0.001243,0.020073,0.060646,0.008235,0.019214,0.001336,0.011021,0.024742,0.002028,0.006240,0.010159,0.033780,0.003327
SUB134308.1,0.008994,0.010750,0.047691,0.016218,0.022023,0.010121,0.006164,0.037385,0.011749,0.011331,0.020740,0.008429,0.052589,0.020222,0.033218,0.014947,0.005474,0.029538,0.002737,0.005361,0.014795,0.019657,0.003518,0.005613,0.000156,0.016889,0.012371,0.025622,0.015519,0.014954,0.018900,0.003242,0.006394,0.014620,0.015879,0.055685,0.026438,0.021533,0.002588,0.006021,0.005340,0.008474,0.041103,0.008917,0.008773,0.024722,0.002208,0.018251,0.072025,0.007354,0.015556,0.001213,0.008768,0.020937,0.004442,0.009489,0.011690,0.045710,0.011994
SUB134296.1,0.009554,0.013164,0.027434,0.021018,0.012341,0.005096,0.000996,0.019988,0.006279,0.005441,0.012719,0.007850,0.032626,0.017826,0.042267,0.023009,0.005484,0.015429,0.004300,0.004974,0.018842,0.030571,0.007063,0.005907,0.000053,0.018280,0.028131,0.031512,0.006647,0.020564,0.019376,0.004425,0.011975,0.013153,0.012866,0.029891,0.037526,0.035332,0.003390,0.004128,0.007063,0.005510,0.068851,0.008064,0.006706,0.015773,0.001122,0.021814,0.079491,0.011304,0.007712,0.000483,0.006173,0.010388,0.001352,0.009911,0.006171,0.015338,0.010757
SUB134295.1,0.025171,0.007043,0.037026,0.013766,0.019586,0.009951,0.008024,0.033177,0.012212,0.009695,0.017992,0.009459,0.053930,0.033476,0.020192,0.026682,0.005507,0.025803,0.004132,0.005191,0.011230,0.018890,0.005260,0.004195,0.000067,0.015315,0.012194,0.021148,0.013827,0.014984,0.020562,0.004045,0.012636,0.015255,0.009338,0.027962,0.031486,0.034828,0.002011,0.008732,0.004836,0.005671,0.052031,0.006271,0.009605,0.038776,0.003445,0.021242,0.021883,0.011758,0.023194,0.001312,0.011986,0.017185,0.002107,0.012008,0.011197,0.019056,0.013304


## 5 - Evaluate Results.

In case true proportions are available.

In [17]:
res = calc_corr(labels, result) #, columns=['dataset', 'celltype', 'pearson', 'spearman', 'p'])
res

Unnamed: 0,xCell,celltype,Pearson,Spearman,p
0,xCell,B-cells,0.326551,0.307319,0.01428427
1,xCell,CD4+ T-cells,0.311335,0.288598,0.02179919
2,xCell,CD8+ T-cells,0.741032,0.714142,4.992748e-11
3,xCell,CD4+ Tem,-0.060774,-0.090086,0.4825858
4,xCell,CD8+ Tem,0.353798,0.388743,0.001640534
5,xCell,Tgd cells,0.019891,0.246717,0.05126046
6,xCell,Memory B-cells,0.326477,0.37503,0.002457964
7,xCell,Monocytes,0.363641,0.353355,0.004500505
8,xCell,naive B-cells,0.47487,0.61967,6.127644e-08
9,xCell,CD4+ naive T-cells,0.272285,0.278325,0.02719256
