The source code was written in R and can be run on any Linux and Windows systems that support R. It was tested with R version 4.0.1. R packages dependencies include: Seurat (version 3.0.2), SingleR (version: 1.0.6), DESeq2 (version: 1.28.1), clusterProfiler (version: 3.16.0), immunarch (version: 0.6.5).
Please find the installation guide of R and R packages in their home page.
- R: https://cran.r-project.org/
- Seurat: https://satijalab.org/seurat/
- SingleR: https://bioconductor.org/packages/release/bioc/html/SingleR.html
- DESeq2: https://bioconductor.org/packages/release/bioc/html/DESeq2.html
- clusterProfiler: https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html
- immunarch: https://immunarch.com/
The installation is usually straightforward following the guide of the packages and Bioconductor.
-
Seurat analysis includes quality control, normalization, sample aggregating, dimension reduction, clustering and visualization. The expected outputs include cell clusters, gene expression matrix and various plots for visualization (UMAP, marker gene expression). You can find the demo of standard workflow in the home page of Seurat: https://satijalab.org/seurat/vignettes.html
-
SingleR performs cell type annotation from single-cell RNA sequencing data, by leveraging reference transcriptomic datasets of pure cell types. You can follow the demo of Bioconductor: https://bioconductor.org/packages/release/bioc/vignettes/SingleR/inst/doc/SingleR.html
-
DESeq2 can perform differential expression analysis based on the pseudo-bulk expression profiles aggregated for each cell type. It will output the DEG list and related statistics such as FDR. You can follow the demo of Bioconductor: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
-
clusterProfiler performs functional enrichment analyses including over-representation of GO and KEGG, as well as GSEA. It will output the enriched functions and related statistics. You can follow the demo here: http://yulab-smu.top/clusterProfiler-book/
-
immunarch performs clonotype analyses of BCRs and TCRs, including clonotype diversity, clonotype abundance, clonotype tracking across samples and gene usages. You can find the demo in the home page of immunarch: https://immunarch.com/
Note: the input and output files are specified in the scripts and should be carefully checked for reuse.
This script is used to integrate samples collected in Batch 1. To run the script on the real data, first prepare the cellranger output files for each sample. For scRNA-seq data, three files will be generated by cellranger count
(barcodes.tsv.gz
, features.tsv.gz
, matrix.mtx.gz
). Put these files in a directory with corresponding sample name.
-
samples
vector of sample/directory names of input data -
low_cutoff
high_cutoff
vector of UMI cutoffs for QC
-
umap_0.1.pdf
umap_1.2.pdf
UMAP plot of cell clusters with different resolutions -
markers.pdf
visualization of cell markers across cell clusters -
batch1_cellcount.txt
cell counts across cell clusters -
batch1_UMIcount.txt
UMI count matrix of genes by cell clusters (pseudo-bulk expression profile) -
batch1.Rdata
R data file containing the integrated data object (comb_data
)
This script transfers cell labels from Batch 1 to Batch 2 samples. The input and output files are similar to batch1_integrate.R
.
-
samples
vector of sample/directory names of input data -
low_cutoff
high_cutoff
vector of UMI cutoffs for QC
-
batch2_cellcount.txt
cell counts across cell clusters -
batch2_UMIcount.txt
UMI count matrix of genes by cell clusters (pseudo-bulk expression profile) -
batch2.Rdata
R data file containing the merged data object (merge_data
)
This script perform cell type annotation for Seurat clusters using SingleR. Five reference datasets (human primary cell atlas, Blueprint/ENCODE, Database of Immune Cell Expression, Novershtern hematopoietic data and Monaco immune data) are used for annotation, respectively.
batch1.Rdata
R data file containing the integrated data object (comb_data
)
SingleR.ref.pdf
visualization of annotated clusters with each reference dataset
This script summarizes cell percentage by types for visualization and statistical test.
-
batch1_cellcount.txt
batch2_cellcount.txt
cell counts across cell clusters for each sample -
cell_anno.txt
manually refined cell type annotations for the cell clusters
-
celltype.abundance.pdf
boxplot of cell type percentage across conditions -
p.value
P-values of statistical test to compare the percentage across conditions
This script performs differential expression analysis for each cell compartment with DESeq2 based on the pseudo-bulk expression profile.
batch1_UMIcount.txt
batch2_UMIcount.txt
UMI count matrix of genes by cell clusters (pseudo-bulk expression profile)cluster
vectors of cell clusters for DEG analysissample1
vector of sample names for comparison between pre- and post-treatmentmeta1
list of meta information forsample1
, includingpatient
andtreatment
sample2
vector of sample names for comparison between pre-treatment and healthy controlsmeta2
list of meta information forsample2
CPM1.txt
normalized pseudo-bulk expression profile with CPM (count per million mapped reads) between pre- and post-treatment. This file is formated as the GSEA input.stat1.txt
result table of statistical test for each gene, including P-value and FDR between pre- and post-treatmentCPM2.txt
normalized pseudo-bulk expression profile with CPM between pre-treatment and healthy controls. This file is formated as the GSEA input.stat2.txt
result table of statistical test for each gene, including P-value and FDR between pre-treatment and healthy controls
This script performs GO, KEGG and MSigDB hallmark gene sets enrichment analysis with clusterProfiler.
-
stat1.txt
stat2.txt
result tables of statistical test for each gene with DESeq2 -
genecut
FDR cutoff for DEGs -
enrichcut
FDR cutoff for enrichment analysis
ego_MF.txt
result table of enrichment analysis for GO MFego_BP.txt
result table of enrichment analysis for GO BPekegg.txt
result table of enrichment analysis for KEGGegmt.txt
result table of enrichment analysis for MSigDB hallmark gene sets
This script is used to filter clonotypes of IGs for each sample. Only clonotypes with productive and paired IG chains (IGH and IGL/IGK) will be preserved. Please note clonotypes of TCRs can also be analyzed with the script by changing IGH
and IGL
to TRA
and TRB
. This script requires two input files created by cellranger vdj
: consensus_annotations.csv
and filtered_contig_annotations.csv
.
consensus_annotations.csv
annotations of consensus clonotype sequencesfiltered_contig_annotations.csv
annotations of contig sequences for each cell
filtered_contig_annotations_productive_pair.csv
filtered clonotype annotations
This script compares the clonotypes between two paired samples with immunarch.
immdata
load the directory containing two paired samples (i.e., twofiltered_contig_annotations_productive_pair.csv
files)
-
repDiversity
calculate the Gini coefficient of the two samples -
trackClonotypes
visualize clonotype tracking between the two samples -
clone_select.csv
abundance change of individual clonotypes between the two samples with the fisher's test