CytoBackBone: an R package for merging of phenotype information from different cytometric profiles
Table of Contents
Single-cell technologies are the most suitable techniques for the characterization of cells by the differential expression of the molecules that define their roles and functions in tissues. Among these techniques, mass cytometry represents a leap forward by increasing the number of available measurements to approximately 40 cell markers. Thanks to this technology, detailed immune responses were described in several diseases.However, the study of immune responses, such as that due to viral infections or auto-immune diseases, could be further improved by increasing the number of simultaneously measurable markers. To increase this number, we designed an algorithm, named CytoBackBone, which combines phenotypic information of different cytometric profiles obtained from different cytometry panels.
Thus, we designed an algorithm, named CytoBackBone, which allows combining phenotypic information of cells from different cytometric profiles obtained from different cytometry panels. This computational approach is based on the principle that each cell has its own phenotypic and functional characteristics that can be used as an identification card. CytoBackBone uses a set of predefined markers, that we call the backbone, to define this identification card. The phenotypic information of cells with similar identification cards in the different cytometric profiles is then merged. Our approach can extend the number of measurable markers without technical modification of mass or flow cytometers.
CytoBackBone uses a k Nearest Neighbor distances algorithm, based on k-d tree space partitioning, to identify cells that are acceptable and non-ambiguous nearest neighbors.
To be acceptable neighbors, the phenotypic distance between two cells must be lower than a specific threshold, defined by the user, which corresponds to a Euclidian distance computed as the square root of the sum of the squared expression differences for each pair of common markers.
To be defined as non-ambiguous nearest neighbors, two cells from two different profiles must be reciprocally the closest neighbors. First, the algorithm identifies the closest cells in cytometric profile #2 for each cell from cytometric profile #1. Then, the algorithm identifies the closest cells in cytometric profile #1 for each cell from cytometric profile #2. Merging is possible if two cells from different cytometric profiles are identified as mutual non-ambiguous nearest neighbors.
More precisely, the algorithm works as follows: (i) cells of each cytometric profile with no acceptable neighbors are first excluded from the two input profiles; (ii) all acceptable and non-ambiguous nearest neighbor cells remaining in the two profiles are merged into a new profile and are discarded from the two input profiles; (iii) the previous step of this procedure is repeated until no more acceptable and non-ambiguous neighbor cells can be found; and (iv) finally, all excluded and remaining cells are stored in supplementary cytometric profiles for information purposes. These successive iterations ensure that the algorithm finds a new set of acceptable and non-ambiguous neighbor cells at each step. The number of iteration performed by the algorithm varies depending on the dataset used, but usually ranges from 2 to 30 interactions.
A distance threshold defining acceptable nearest neighbors needs to be specified to avoid merging data between two cells that have very different levels for the backbone markers: the lower the threshold, the more accurate the merging. However, the threshold must not be too stringent as, by definition, cells cannot have exactly the same phenotype. On the contrary, merging can result in a significant loss of cells. Thus, the threshold must be carefully set to allow CytoBackBone to merge data from cells with similar levels of the backbone markers within a reasonable error range. The Euclidean distance between cells increases with the number of backbone markers. Therefore, the threshold value must be adjusted according to the length of this backbone.
ggplot2 R packages as well as the
preprocessCore Bioconductor packages are required for CytoBackBone. These packages can be installed using the following commands:
install.packages('FNN') install.packages('ggplot2') source("http://bioconductor.org/biocLite.R") biocLite(suppressUpdates=TRUE) biocLite("flowCore",suppressUpdates=TRUE) biocLite("flowUtils",suppressUpdates=TRUE) biocLite("preprocessCore",suppressUpdates=TRUE)
install.packages('devtools') library("devtools") install_github('tchitchek-lab/CytoBackBone')
Once installed, CytoBackBone can be loaded using the following command:
Cytometry profiles to merge can be imported as FCS files using the
This function returns an object of class
FCS1 <- import.FCS("./sample1-panelA.fcs") # select only some specific marker from the `FCS` object FCS1 <- FCS1[,c("CD11c","CD123","CD14","CD141","CD16","CCR5","CD11a","CD23","CD83","IFN-a","IL-1b","IL6","MCP-1","TNF-a")] FCS2 <- import.FCS("./sample1-panelB.fcs") # select only some specific marker from the `FCS` object FCS2 <- FCS2[,c("CD11c","CD123","CD14","CD141","CD16","CD11b","CD28","CXCR4","IFN-g","IL-12","IL-1RA","IL-8","IP-10","MIP-1b","TLR2")]
## Object class: FCS ## Object name: sample1-panelA.fcs ## Number of cells: 396,240 ## Number of markers: 11 ## Markers: CD11c; CD123; CD14; CD141; CD16; CCR5; CD11a; CD23; CD83; IL6; MCP-1
## Object class: FCS ## Object name: sample1-panelB.fcs ## Number of cells: 381,679 ## Number of markers: 9 ## Markers: CD11c; CD123; CD14; CD141; CD16; CD11b; CD28; CXCR4; TLR2
plot function can be used to display the phenotype of an
FCS objects can be merged using the
The markers to use as backbone markers must be specified using the
th parameter indicates the global distance threshold to consider two cells as being acceptable neighbors.
FCS_merged <- merge(FCS1,FCS2,BBmarkers=c("CD11c","CD123","CD14","CD141","CD16"),th=0.01*5)
## === CytoBackBone === ## Normalizing profiles ## processing marker: CD11c ## processing marker: CD123 ## processing marker: CD14 ## processing marker: CD141 ## processing marker: CD16 ## profile 1 contains 396,240 cells ## profile 2 contains 381,679 cells ## === ## profile 1 has 136,908 cells can be potentialy matched ## profile 2 has 131,841 cells can be potentialy matched ## === ## maximum of number of cells that can be matched by CytoBackBone = 131,841 ## === ## step #1: 77,371 cells matched (54,470 cells remaining) ## step #2: 99,051 cells matched (32,790 cells remaining) ## step #3: 108,064 cells matched (23,777 cells remaining) ## step #4: 112,149 cells matched (19,692 cells remaining) ## step #5: 114,082 cells matched (17,759 cells remaining) ## step #6: 115,018 cells matched (16,823 cells remaining) ## step #7: 115,494 cells matched (16,347 cells remaining) ## step #8: 115,707 cells matched (16,134 cells remaining) ## step #9: 115,772 cells matched (16,069 cells remaining) ## step #10: 115,786 cells matched (16,055 cells remaining) ## step #11: 115,788 cells matched (16,053 cells remaining) ## ====================
## Object class: FCS ## Object name: FCS+FCS ## Number of cells: 115,788 ## Number of markers: 15 ## Markers: CD11c; CD123; CD14; CD141; CD16; CCR5; CD11a; CD11b; CD23; CD28; CD83; CXCR4; IL6; MCP-1; TLR2
FCS objects can be save as FCS files using the
export(FCS_merged,filename = "merged.fcs")