Skip to content

tarot0410/BREMSC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BREMSC

BREMSC is an R package (with core functions jointDIMMSC and BREMSC) for joint clustering droplet-based scRNA-seq and CITE-seq data. jointDIMMSC is developed as a direct extension of DIMMSC, which assumes full indenpendency between single cell RNA and surface protein data. To take the correlation between two data sources into consideration, we further develop BREMSC, which uses random effects to incorporate the two data sources. This package can directly work on raw count data from droplet-based scRNA-seq and CITE-seq experiments without any data transformation, and it can provide clustering uncertainty for each cell.

Version: 0.2.0 (Date: 2020-03-02)

See Homepage @ Wei Chen's Lab

Installation

Install BREMSC from Github

install.packages("devtools")
library(devtools)
install_github("tarot0410/BREMSC")

Or terminal command (first download BREMSC source file from Wei Chen's Lab website)

R CMD INSTALL BREMSC_0.1.0.tar

Function jointDIMMSC

Introduction

jointDIMMSC is developed as an extension of DIMMSC, which assumes full indenpendency between single cell RNA and surface protein data. We construct the joint likelihood of the two data sources as their product, and use EM algorithm for parameter inference. In practice, the computational speed for jointDIMMSC is much faster than BREMSC, but the model assumption is more stringent.

Usage

jointDIMMSC(dataProtein, dataRNA, K, useGene = 100, maxiter = 100, tol = 1e-04, lik.tol = 0.01)

Arguments

  • dataProtein : a D*C matrix with D proteins and C cells
  • dataRNA : a G*C matrix with G genes and C cells
  • K : number of desired clusters
  • useGene : number of genes to keep
  • maxiter : maximum number of iterations
  • tol : a convergence tolerance for the difference of vector pie between iterations
  • lik.tol : a convergence tolerance for the difference of log-likelihoods between iterations

Values

jointDIMMSC returns a list object containing:

  • clusterID : estimated label for each cell
  • posteriorProb : a C*K matrix with probability that each cell belongs to each cluster
  • logLik : estimated log likelihood
  • alphaMtxProtein : a K*D matrix of alpha estimates for protein source
  • alphaMtxRNA : a K*G matrix of alpha estimates for RNA source

Example:

# First load BREMSC R package
library(BREMSC)

# Next load the example simulated data (dataADT: protein data; dataRNA: RNA data)
data("dataADT")
data("dataRNA")

# Test run of jointDIMMSC
testRun <- jointDIMMSC(dataADT, dataRNA, K=4)

Function BREMSC

Introduction

Similar to jointDIMMSC, BREMSC uses separate Dirichlet mixture priors to characterize variations across cell types for each data source, but it further uses random effects to incorporate the two data sources. A Bayesian framework with Gibbs-sampling is used for parameter estimation. The computational speed for BREMSC is much slower than jointDIMMSC. In practice, nMCMC>500 is suggested in real application, and running with more than 3 chains (set as a parameter) are strongly recommended for better stability. Also, a prescreening of RNA features is necessary to help reduce time and noise for BREM-SC. It is recommended to use less than 1000 RNA features.

Usage

BREMSC(dataProtein, dataRNA, K, nChains = 3, nMCMC = 1000, sd_alpha = c(0.5, 1.5), sd_b = c(0.2, 1), sigmaB = 0.8)

Arguments

  • dataProtein : a D*C matrix with D proteins and C cells
  • dataRNA : a G*C matrix with G genes and C cells, recommended to do prescreening of RNA features (<=1000) before running BREM-SC
  • K : number of desired clusters
  • nChains : number of chains to run in parallel (need to have >= nChains number of threads available), 3-5 is recommended, default 3
  • nMCMC : number of MCMCs, default 1000
  • sd_alpha_range : range of random walk depth for alpha, each chain uses a different depth, default c(0.5, 1.5)
  • sd_b_range : range of random walk depth for b, each chain uses a different depth, default c(0.2, 1)
  • sigmaB : initial estimate of SD of random effects, default 0.8

Values

BREMSC returns a list object containing:

  • clusterID : estimated label for each cell
  • posteriorProb : a C*K matrix with probability that each cell belongs to each cluster
  • sdRF_final : estimated SD of random effects from last MCMC
  • logLik_final : estimated log likelihood from last MCMC
  • vecB : estimated random effects for each cell
  • alphaMtxProtein : a K*D matrix of alpha estimates for protein source
  • alphaMtxRNA : a K*G matrix of alpha estimates for RNA source
  • vecLogLik : a vector of log likelihood for each MCMC (used for convergence check!)
  • vecSDRF : a vector of estimated SD of random effects from each MCMC

Example:

# First load BREMSC R package
library(BREMSC)

# Next load the example simulated data (dataADT: protein data; dataRNA: RNA data)
data("dataADT")
data("dataRNA")

# Test run of BREMSC (using small number of MCMC here to save time)
testRun <- BREMSC(dataADT, dataRNA, K=4, nChains=2, nMCMC=100)

# Check convergence of log likelihoos
plot(testRun$vecLogLik, type = "l", xlab = "MCMC Iterations", ylab = "Log likelihood") # consider to increase the number of MCMCs if the log likelihood doesn't look like converged

Publications

  • Xinjun Wang, Zhe Sun, Yanfu Zhang, Zhongli Xu, Hongyi Xin, Heng Huang, Richard H Duerr, Kong Chen, Ying Ding, Wei Chen. BREM-SC: A Bayesian Random Effects Mixture Model for Joint Clustering Single Cell Multi-omics Data. Nucleic Acids Research 2020.

Contact

Xinjun Wang (xiw119@pitt.edu), Wei Chen.

About

Novel joint clustering method with scRNA-seq and CITE-seq data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages