scDHMap

Understanding the developmental process is a critical step in single-cell analysis. This repo proposes scDHMap, a model-based deep learning approach to visualize the complex hierarchical structures of single-cell sequencing data in a low dimensional hyperbolic space. ScDHMap can be used for various dimensionality reduction tasks including revealing trajectory branches, batch correction, and denoising highly dropout counts.

Network diagram

Requirements

Python: 3.9.5
PyTorch: 1.9.1 (https://pytorch.org)
Scanpy: 1.7.2 (https://scanpy.readthedocs.io/en/stable)
Numpy: 1.21.2 (https://numpy.org)
sklearn: 0.24.2 (https://scikit-learn.org/stable)
Scipy: 1.6.3 (https://scipy.org)
Pandas: 1.2.5 (https://pandas.pydata.org)
h5py: 3.2.1 (https://pypi.org/project/h5py)
Optional: harmonypy (https://github.com/slowkow/harmonypy)

Usage

For single-cell count data:

python run_scDHMap.py --data_file data.h5

For single-cell count data from multiple batches (requires harmonypy package):

python run_scDHMap_batch.py --data_file data.h5

The real single cell datasets used in this study can be found: https://figshare.com/s/64694120e3d2b87e21c3

In the data.h5 file, cell-by-gene count matrix is stored in "X". For dataset with batches, batch IDs are one-hot encoded matrix and stored in "Y".

Parameters

--batch_size: batch size, default = 512.
--data_file: data file name.
--select_genes: number of selected genes for embedding analysis, default = 1000. It will use the mean-variance relationship to select informative genes.
--n_PCA: number of principle components for the t-SNE part, default = 50.
--pretrain_iter: number of pretraining iterations, default = 400.
--maxiter: number of max iterations during training stage, default = 5000.
--patience: patience in training stage, default = 150.
--lr: learning rate in the Adam optimizer, default = 0.001.
--alpha: coefficient of the t-SNE regularization, default = 1000. The choice of alpha is to balance the number of genes in the ZINB reconstruction loss.
--beta: coefficient of the wrapped normal KLD loss, default = 10. If points in the embedding are all stacked near the boundary of the Poincare disk, you may choose a larger beta value.
--gamma: coefficient of the Cauchy kernel, default = 1. Larger gamma means greater repulsive force between non-neighboring points. Please note that larger gamma values will push points to the boundary of the Poincare ball. For better visualization, we recommend to choose larger beta values when using larger gamma values. In our experience, the KLD loss value < 10 during training stage step will result to nice visualization. See the effect of different gamma's in Supplementary Figure S23 in our manuscript.
--prob: dropout probability in encoder and decoder layers, default = 0.
--perplexity: perplexity of the t-SNE regularization, default = 30.
--final_latent_file: file name to output final latent Poincare representations, default = final_latent.txt.
--final_mean_file: file name to output denoised counts, default = denoised_mean.txt.

Outputs

final_latent: 2-dimensional embedding in Poincare space of single-cell data, shape (n_cells, 2).
final_mean: denoised (decoded) gene counts, shape (n_cells, n_genes).

Folders

Paul_Analysis: code for the analysis of Paul data, including isometric transformation and branch assignment.
competing_methods: code for running competing methods.
scATAC_seq_analysis: code for gene activity score in scATAC-seq data.
src: source code of scDHMap model.

Reference

Tian T., Cheng Z., Xiang L., Zhi W., & Hakon H. (2023). Complex hierarchical structures in single-cell genomics data unveiled by deep hyperbolic manifold learning. Genome Research 33 (2), 232-246. https://doi.org/10.1101/gr.277068.122

Visualization demo

Visualization demo of the Paul data (Credit: Joshua Ortiga)

https://hosua.github.io/scDHMap-visual/article/2022/11/09/paul-data-visualization.html

Contact

Tian Tian tiantianwhu@163.com

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
Paul_Analysis		Paul_Analysis
competing_methods		competing_methods
scATAC_seq_analysis		scATAC_seq_analysis
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
Tutorial.ipynb		Tutorial.ipynb
network.png		network.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scDHMap

Table of contents

Network diagram

Requirements

Usage

Parameters

Outputs

Folders

Reference

Visualization demo

Contact

About

Releases 1

Packages

Languages

License

ttgump/scDHMap

Folders and files

Latest commit

History

Repository files navigation

scDHMap

Table of contents

Network diagram

Requirements

Usage

Parameters

Outputs

Folders

Reference

Visualization demo

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages