What is scTPA
scTPA is a web tool for single-cell transcriptome analysis and annotation based on biological pathway activation in human and mice. We collected a large number of biological pathways with different functional and taxonomic classifications, which facilitates the identification of key pathway signatures for cell type annotation and interpretation.
- Calculating pathway activity score of single cell
- Dimension reduction
- Clustering of cell population by different methods
- Identifying significantly activated pathways of cell clusterings
- Comparison analysis of the associated gene expression profiles of pathways
- step1 Download scTPA
scTPA can be download directly from
Download ZIP
button. Alternatively, scTPA can be installed through github: enter the directory where you would like to install scTPA and run
git clone https://github.com/yupenghe/methylpy.git
cd scTPA/
- step2 Install dependent R packages For using scTPA, user must install following packages,:
Seurat foreach bigstatsr data.table dplyr scales ggplot2 cowplot pheatmap
To install this packages, start "R" and enter:
if (!requireNamespace(c("Seurat","bigstatsr","data.table","foreach","dplyr","scales","ggplot2","cowplot","pheatmap"), quietly = TRUE))
install.packages(c("Seurat","bigstatsr","data.table","foreach","dplyr","scales","ggplot2","cowplot","pheatmap"))
- step3 Install optional R packages If user want to use some specialized method in scTPA, the following R packages are required.
scran
for "scran" normalization methodscImpute
for "scImpute" imputation methodSIMLR
for "simlr" clustering methoddbscan
for "dbscan" clustering method
scran
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!requireNamespace("scran", quietly = TRUE))
BiocManager::install("scran")
scImpute
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
if (!requireNamespace("scImpute", quietly = TRUE))
devtools::install_github("Vivianstats/scImpute")
SIMLR
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
if (!requireNamespace("scImpute", quietly = TRUE))
devtools::install_github("Vivianstats/scImpute")
dbscan
if (!requireNamespace("dbscan", quietly = TRUE))
BiocManager::install("dbscan")
Rscript /path/to/you/scTPA-master/R/scTPA.R -f /path/to/you/scTPA-master/test/expression.csv --cellType /path/to/you/scTPA-master/test/cell_type.csv --work_dir /path/to/you/scTPA-master/ --species homo --pathway_database kegg --para_size 1 -o /path/to/you/scTPA-master/results/
Once the program has run successfully, a series of results files and folders will appear in the results folder.
Rscript /path/to/you/scTPA/scTPA.R -h
Options:
-f FILE, --file=FILE
gene expression profile, genes X cells
--cellType=CELLTYPE
cell type file. First column is cell name (same as the colnames of gene expression profile), second column is cell type. No header names.[default= NULL]
--work_dir=WORK_DIR
Workshop direction. [default= ./]
--normalize=NORMALIZE_METHOD
methods used for normalization. "log", "CLR", "RC" or "scran"[default= none]
--min_cells=MIN_CELLS
genes must be in a minimum number of cells. Used for filtering genes[default= 3]
--min_features=MIN_FEATURES
cells must have at least the minimum number of genes. Used for filtering cells[default= 200]
--species=SPECIES
species. "homo" or "mus"[default= homo]
--imputation=IMPUTATION
Imputation method. "scImpute" or "none"[default= none]
--data_type=FILE
data type of gene expression profile,"TPM" or "count"[default= TPM]
--pathway_database=PATHWAY_DATABASE
pathway database, detials see https://github.com/sulab-wmu/scTPA[default= kegg]
--user_pathway=USER_PATHWAY
user defined pathway file,only for gmt format[default = NULL]
--pas_method=PAS_METHOD
method for calculating PAS. "gsva", "ssgsea", "zscore" or "plage"[default= ssgsea]
--para_size=PARA_SIZE
number of kernels used for parallel[default= 4]
--cluster_method=CLUSTER_METHOD
clustering method. "seurat", "hclust", "simlr", "kmedoids", "kmeans" or "dbscan"[default= seurat]
--seurat_dims=SEURAT_DIMS
dimensions used in Seurat clustering[default= 8]
--seurat_resolution=SEURAT_RESOLUTION
resolution used for Seurat clustering[default= 0.5]
--k_cluster=K_CLUSTER
number of clusters, useless if clustering method is Seurat or dbscan[default= 5]
--min_pts=MIN_PTS
parameter in DBSCAN[default= 3]
--dims=DIMS
number of PCA dimensions used for TSNE or UMAP[default= 20]
--marker_method=FIND_MAKER_METHOD
method of finding siginificant markers[default= wilcox]
--logFC_thre=THRESHOLD_LOGFC
threshold of logFC (Detail see Seurat)[default= 0.25]
--min_pct=MIN_PCT
only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations.[default= 0.1]
--pic_type=PIC_TYPE
type of picture, png or pdf [default= png]
-o OUT_DIR, --out_dir=OUT_DIR
output folder[default= NULL]
-h, --help
Show this help message and exit
--normalize
:
log: Log transform. Feature counts output for each cell is divided by the total counts for that cell and multiplied by 1e4. This is then natural-log transformed.
CLR: Centered log ratio. A commonly used Compositional Data Analysis (CoDA) transformation method.
RC: Relative counts. Feature counts output for each cell are is divided by the total counts for that cell and multiplied by 1e4 (for TPM/CPM/FPKM/RPKM this value is 1e6).
scran: The normalization strategy for scRNA-seq is implemented based on the deconvolutional size factor using the scran R package. Detials see scran
none: Do not implement normalization
--imputation
:
scImpute: Imputing missing value of data matrix following filtering and normalization steps and this function is performed using scImpute R package
none: Do not implement imputation.
--data_type
:
count: Discrete Data.
TPM: Continuous data.
--pathway_database
:
when "--species" is "homo", "--pathway_database" can be select as follow: kegg: An encyclopaedia for genes reaction and regulation. KEGG. reactome: A curated database for biomolecular pathways. Reactome. biocarta: A pathway database for gene regulation. BioCarta. smpdb: A small molecules pathway database. SMPDB. humancyc: A curated pathway database of human metabonomics. HumanCyc. panther: A curated pathway database for protein annotation through evolutionary relationship. PANTHER. pharmgkb: A curated pathway database for pharmacogenomics. pharmGKB. acsn2: A web-based resource depicting signalling and regulatory molecular processes in cancer cell and tumor microenvironment. ACSN v2.0. rb: A curated map of molecular interactions about retinoblastoma protein (RB/RB1). RB-Pathways. h.all: Hallmark gene sets. MSigDB. c2.cgp: Chemical and genetic perturbations. MSigDB. c2.cp: c4.cgn: Cancer gene neighborhoods. MSigDB. c4.cm: Cancer modules. MSigDB. c5.bp: GO biological process. MSigDB. c5.mf: GO cellular component. MSigDB. c5.cc: GO molecular function. MSigDB. c6.all: Oncogenic signatures. MSigDB. c7.all: Immunologic signatures. MSigDB.
when "--species" is mus, "--pathway_database" can be select as follow: kegg: An encyclopaedia for genes reaction and regulation. KEGG. reactome: A curated database for biomolecular pathways. Reactome. smpdb: A small molecules pathway database. SMPDB. c5.bp: GO biological process. GSKB. c5.mf: GO cellular component. GSKB. c5.cc: GO molecular function. GSKB. other: Including "Location", "HPO", "STITCH", "MPO", "T3DB", "PID", "MethyCancer" and "MethCancerDB*, details see table. GSKB.