Skip to content

zhenwang100/scKD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Single-cell sequencing of peripheral blood mononuclear cells in acute Kawasaki disease

System requirements

The source code was written in R and can be run on any Linux and Windows systems that support R. It was tested with R version 4.0.1. R packages dependencies include: Seurat (version 3.0.2), SingleR (version: 1.0.6), DESeq2 (version: 1.28.1), clusterProfiler (version: 3.16.0), immunarch (version: 0.6.5).

Installation guide

Please find the installation guide of R and R packages in their home page.

The installation is usually straightforward following the guide of the packages and Bioconductor.

Demo

  • Seurat analysis includes quality control, normalization, sample aggregating, dimension reduction, clustering and visualization. The expected outputs include cell clusters, gene expression matrix and various plots for visualization (UMAP, marker gene expression). You can find the demo of standard workflow in the home page of Seurat: https://satijalab.org/seurat/vignettes.html

  • SingleR performs cell type annotation from single-cell RNA sequencing data, by leveraging reference transcriptomic datasets of pure cell types. You can follow the demo of Bioconductor: https://bioconductor.org/packages/release/bioc/vignettes/SingleR/inst/doc/SingleR.html

  • DESeq2 can perform differential expression analysis based on the pseudo-bulk expression profiles aggregated for each cell type. It will output the DEG list and related statistics such as FDR. You can follow the demo of Bioconductor: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

  • clusterProfiler performs functional enrichment analyses including over-representation of GO and KEGG, as well as GSEA. It will output the enriched functions and related statistics. You can follow the demo here: http://yulab-smu.top/clusterProfiler-book/

  • immunarch performs clonotype analyses of BCRs and TCRs, including clonotype diversity, clonotype abundance, clonotype tracking across samples and gene usages. You can find the demo in the home page of immunarch: https://immunarch.com/

Instructions for use

Note: the input and output files are specified in the scripts and should be carefully checked for reuse.

batch1_integrate.R

This script is used to integrate samples collected in Batch 1. To run the script on the real data, first prepare the cellranger output files for each sample. For scRNA-seq data, three files will be generated by cellranger count (barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz). Put these files in a directory with corresponding sample name.

Input

  • samples vector of sample/directory names of input data

  • low_cutoff high_cutoff vector of UMI cutoffs for QC

Output

  • umap_0.1.pdf umap_1.2.pdf UMAP plot of cell clusters with different resolutions

  • markers.pdf visualization of cell markers across cell clusters

  • batch1_cellcount.txt cell counts across cell clusters

  • batch1_UMIcount.txt UMI count matrix of genes by cell clusters (pseudo-bulk expression profile)

  • batch1.Rdata R data file containing the integrated data object (comb_data)

batch2_transfer.R

This script transfers cell labels from Batch 1 to Batch 2 samples. The input and output files are similar to batch1_integrate.R.

Input

  • samples vector of sample/directory names of input data

  • low_cutoff high_cutoff vector of UMI cutoffs for QC

Output

  • batch2_cellcount.txt cell counts across cell clusters

  • batch2_UMIcount.txt UMI count matrix of genes by cell clusters (pseudo-bulk expression profile)

  • batch2.Rdata R data file containing the merged data object (merge_data)

cell_annotation.R

This script perform cell type annotation for Seurat clusters using SingleR. Five reference datasets (human primary cell atlas, Blueprint/ENCODE, Database of Immune Cell Expression, Novershtern hematopoietic data and Monaco immune data) are used for annotation, respectively.

Input

  • batch1.Rdata R data file containing the integrated data object (comb_data)

Output

  • SingleR.ref.pdf visualization of annotated clusters with each reference dataset

cell_abundance.R

This script summarizes cell percentage by types for visualization and statistical test.

Input

  • batch1_cellcount.txt batch2_cellcount.txt cell counts across cell clusters for each sample

  • cell_anno.txt manually refined cell type annotations for the cell clusters

Output

  • celltype.abundance.pdf boxplot of cell type percentage across conditions

  • p.value P-values of statistical test to compare the percentage across conditions

DEG_analysis.R

This script performs differential expression analysis for each cell compartment with DESeq2 based on the pseudo-bulk expression profile.

Input

  • batch1_UMIcount.txt batch2_UMIcount.txt UMI count matrix of genes by cell clusters (pseudo-bulk expression profile)
  • cluster vectors of cell clusters for DEG analysis
  • sample1 vector of sample names for comparison between pre- and post-treatment
  • meta1 list of meta information for sample1, including patient and treatment
  • sample2 vector of sample names for comparison between pre-treatment and healthy controls
  • meta2 list of meta information for sample2

Output

  • CPM1.txt normalized pseudo-bulk expression profile with CPM (count per million mapped reads) between pre- and post-treatment. This file is formated as the GSEA input.
  • stat1.txt result table of statistical test for each gene, including P-value and FDR between pre- and post-treatment
  • CPM2.txt normalized pseudo-bulk expression profile with CPM between pre-treatment and healthy controls. This file is formated as the GSEA input.
  • stat2.txt result table of statistical test for each gene, including P-value and FDR between pre-treatment and healthy controls

function_enrich.R

This script performs GO, KEGG and MSigDB hallmark gene sets enrichment analysis with clusterProfiler.

Input

  • stat1.txt stat2.txt result tables of statistical test for each gene with DESeq2

  • genecut FDR cutoff for DEGs

  • enrichcut FDR cutoff for enrichment analysis

Output

  • ego_MF.txt result table of enrichment analysis for GO MF
  • ego_BP.txt result table of enrichment analysis for GO BP
  • ekegg.txt result table of enrichment analysis for KEGG
  • egmt.txt result table of enrichment analysis for MSigDB hallmark gene sets

clonotype_filter.R

This script is used to filter clonotypes of IGs for each sample. Only clonotypes with productive and paired IG chains (IGH and IGL/IGK) will be preserved. Please note clonotypes of TCRs can also be analyzed with the script by changing IGH and IGL to TRA and TRB. This script requires two input files created by cellranger vdj: consensus_annotations.csv and filtered_contig_annotations.csv.

Input

  • consensus_annotations.csv annotations of consensus clonotype sequences
  • filtered_contig_annotations.csv annotations of contig sequences for each cell

Output

  • filtered_contig_annotations_productive_pair.csv filtered clonotype annotations

Clonotype_compare.R

This script compares the clonotypes between two paired samples with immunarch.

Input

  • immdata load the directory containing two paired samples (i.e., two filtered_contig_annotations_productive_pair.csv files)

Output

  • repDiversity calculate the Gini coefficient of the two samples

  • trackClonotypes visualize clonotype tracking between the two samples

  • clone_select.csv abundance change of individual clonotypes between the two samples with the fisher's test