Tabula_Muris_Processing_single_cell_RNAseq

Pipeline for processing and analyzing single cell RNAseq data

Key steps of our analysis are explained below. Please also refer to our manuscript regarding other analyses not online. For technical details please contact Tianyuan Lu (tianyuan.lu@mail.mcgill.ca) or Dr. Jessica C. Mar (j.mar@uq.edu.au).

Preprocessing

Raw counts data can be retrived from the Tabula Muris study at https://github.com/czbiohub/tabula-muris.

These data can be preprocessed by

	Rscript preprocessing.R celltype_annot.tsv

where celltype_annot.tsv can be found in the "CellTypeAnnotation" folder. Change "tissue" in line 5 of the script to "Brain"/"Heart" or any other desired tissue.

Gene model ("TM_gene_genomic_info.tsv") of mm10 is based on UCSC genome browser and supplied to the script.

Imputation

Imputation of the filtered data can be performed by the "scImpute" R package.

	scimpute(count_path = "/path/to/filteredfile" 
     	         infile = "csv",           
     		 outfile = "csv",          
     		 out_dir = "/path/to/output/directory",         
     		 drop_thre = 0.5,          # threshold set on dropout probability
     		 Kcluster = number_of_clusters,             # number of cell populations; in our study, 4 for brain cells and 6 for heart cells
     		 ncores = 10)

Please refer to https://github.com/Vivianstats/scImpute for more instructions. Imputed data have been provided in rds format.

Normalization

Normalization of imputed data can be performed using the following code in R:

	heart <- readRDS("HeartImputed.rds")
	brain <- readRDS("BrainImputed.rds")
	Seuseth <- NormalizeData(object=seuseth, normalization.method="LogNormalize")
	Seusetb <- NormalizeData(object=seusetb, normalization.method="LogNormalize")
	Seuseth <- RunFastMNN(object.list = SplitObject(Seuseth, split.by = "sample"))
	Seusetb <- RunFastMNN(object.list = SplitObject(Seusetb, split.by = "sample"))
	SEUSETH <- FindVariableGenes(object=Seuseth, mean.function=ExpMean, dispersion.function=LogVMR, x.low.cutoff=0.0125, x.high.cutoff=3, y.cutoff=0.5)
	SEUSETB <- FindVariableGenes(object=Seusetb, mean.function=ExpMean, dispersion.function=LogVMR, x.low.cutoff=0.0125, x.high.cutoff=3, y.cutoff=0.5)
	heartHVG <- heart[SEUSETH@var.genes,]
	brainHVG <- brain[SEUSETB@var.genes,]

Loading the "Seurat" R package is necessary. Normalized data are stored in the Seuseth and Seusetb objects.

Identification of differential distribution

Genes having differentially distributed expression patterns can be identified using the "scDD" R package.

	Rscript scDD.R

This script takes in the annotation files in the "CellTypeAnnotation" folder and the imputed and normalized read counts obtained in the last step. Change "celltype" in line 7 & 19 and "tissue" in line 8 accordingly. The outputs are cell type-specific results.

Gene set variation analysis

GSVA can be performed using the R package "GSVA"

	exp <- as.matrix(read.table("expression_matrix", header=T, row.names=1))
	gset <- getGmt("Mus.gmt")
	GSVAresult <- gsva(exp,gset)

where the "Mus.gmt" describing gene sets has been provided.

PANDA network

Construction of PANDA network ensembles can be achieved step by step.

Remove genes on sex chromosomes and get genes documented in TF-gene motifs.

Separate samples by condition, which in our study is gender:

 python3 heart_separate_gender.py
 python3 brain_separate_gender.py

Please prepare required imputed, normalized, sex-chromosome-removed and motif-matched counts data and corresponding annotation files.

Prepare samples for PANDA network ensemble by Jack-knife method

 python3 heart_random_split.py
 python3 brain_random_split.py

Required files have been provided.

Constructing PANDA networks using PANDA scripts in Matlab.

Please refer to https://sites.google.com/a/channing.harvard.edu/kimberlyglass/tools/panda for exhaustive instructions.

Motif files and PPI files have been provided in the "forPANDA" folder.

Differentially represented edges, differentially targeting TFs and differentially targeted genes can be tested using

 python3 Edge_t_test.py edges.pairs male_output female_output
 python3 get_out_degree.py PANDA_output
 python3 get_in_degree.py PANDA_output

Edges.pairs files are provided in the "forPANDA" folder for heart and brain respectively. Note that outputs of the first t test have not been adjusted for multiple testing, and this should be done additionally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tabula_Muris_Processing_single_cell_RNAseq

Preprocessing

Imputation

Normalization

Identification of differential distribution

Gene set variation analysis

PANDA network

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
CellTypeAnnotation		CellTypeAnnotation
forPANDA		forPANDA
All_brain_annot_filtered_outlier.tsv		All_brain_annot_filtered_outlier.tsv
All_heart_annot_filtered_outlier.tsv		All_heart_annot_filtered_outlier.tsv
Edge_t_test.py		Edge_t_test.py
Mus.gmt		Mus.gmt
README.md		README.md
TM_gene_genomic_info.tsv		TM_gene_genomic_info.tsv
brain_random_split.py		brain_random_split.py
brain_sample_tissue_female.txt		brain_sample_tissue_female.txt
brain_sample_tissue_male.txt		brain_sample_tissue_male.txt
brain_separate_gender.py		brain_separate_gender.py
get_in_degree.py		get_in_degree.py
get_out_degree.py		get_out_degree.py
heart_random_split.py		heart_random_split.py
heart_sample_tissue_female.txt		heart_sample_tissue_female.txt
heart_sample_tissue_male.txt		heart_sample_tissue_male.txt
heart_separate_gender.py		heart_separate_gender.py
preprocessing.R		preprocessing.R
scDD.R		scDD.R

tianyuan-lu/Tabula-Muris-Processing-single-cell-RNAseq

Folders and files

Latest commit

History

Repository files navigation

Tabula_Muris_Processing_single_cell_RNAseq

Preprocessing

Imputation

Normalization

Identification of differential distribution

Gene set variation analysis

PANDA network

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages