# **scQCkit**: **s**ingle-**c**ell **Q**uality **C**ontrol **kit**

## Outline
- Why scQCkit is needed?
- How to build scQCkit?
- Citation

## Why scQCkit is needed?


1. 数据质控的重要性：影响生物学信息的获取和下游分析的开展
2. 单细胞测序数据的多样性：不同的物种、不同的组织、不同的测序技术都会影响到数据的特征(稀疏性、鲁棒性...)，选择合适的方法去平衡质控与生物学意义的保留
3. 缺少流程：虽然近些年来已经有不少用于数据质控的软件，但缺少一个集成多种方法的、可复现的、可评估的、用户友好的程序。

- **QC context**
  - Filtering low quality cells
  - (Filtering low quality genes)
  - Correction of ambient RNA
  - Doublet Detection

- **Reference**
  - [https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html](https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html)
  
  <details> <summary> Chinese </summary>
  质控
  动机
  单细胞测序数据有两个在分析时应该重点关注的问题：第一，单细胞RNA测序数据是孔测序(drop-out)意味着这里有0值数据即数据的稀疏性；第二，对数据进行矫正和质控具有潜在的局限性，这可能混淆真实的生物学意义。因此选择对应数据合适的方法即没有过矫正、没有丢失生物学意义的方法来做前处理很重要
  一系列分析单细胞RNA测序数据的工具发展的很快由于新测序技术的发展和在细胞捕获、测序基因和确定的细胞群体数量的增加。一些工具被专用用前处理，它旨在满足后续分析步骤：双细胞识别、质控、标准化、基因选择和降维。这些工具的选择极大的影响下游的分析和对数据的解释。例如，如果过滤掉太多的细胞在质控这一步，你可能丢失一些稀有细胞亚群并丢失感兴趣的生物学信息。此外如果你做质控太宽松，你可能很难去做细胞注释如果你没有将低质量细胞排除掉在质控这一步。因此，选择那些具有很好的操作和有利于下游分析的输出的工具是很重要的，在很多情况下，你需要持续的重评估你的前处理和做出的改变，例如你的过滤策略。
  这个数据被比对去拿到分子计数的矩阵又被称为计数矩阵(count matrices)或者读数矩阵(read matrices).计数和读数矩阵的区别在于是否UMIs(unique molecular identifiers)是否存在于单细胞建库流程中。矩阵为条形码(barcodes)x转录本(transcript)。我们应该是要得到细胞X基因的矩阵，那是因为barcode可能是空的，算不上是一个细胞，只有质控之后才能拿到真实的细胞x基因矩阵。
  </details>

## How to build scQCkit?

- **程序**
  - 方案一：基于软件API接口的使用(便于实现工作流，但是软件环境配置较复杂，程序性能可能不太好)
  - 方案二：算法使用python统一封装(需要基于开源代码改写为python程序，便于统一不同方法的前处理部分，有利于实现高的程序性能)
- **设计**
  - Estimate 评估数据特征
  - QC 数据质控(质控低质量细胞和基因，选择合适的去双胞和去环境RNA方法)
  - QC-benchmark 量化不同质控方案的结果，生成用户优化的结果文档(学习scIB的打分)
- **结果**：用户友好的、可使用多种质控方法、可量化质控方法、高的程序性能(精确性、运行时间少、所需资源少)
- **时间安排**
  - 一个月，文献调研(有那些工具，收集不同已发表的动植物前处理方案)
  - 两个月，撰写流程代码并做程序测试
  - 两个月，案例使用测试
  - 一个月，调试和提交conda安装

## Citation

|Package|Year|Article|
|-|-|-|
|mcRigor|2025|Liu, P., & Li, J. J. (2025). **mcRigor: a statistical method to enhance the rigor of metacell partitioning in single-cell data analysis**. Nature communications, 16(1), 8602. https://doi.org/10.1038/s41467-025-63626-5|
||2025|Fu, Y., Youness, M., Virzì, A., Song, X., Tubeeckx, M. R. L., De Keulenaer, G. W., Heidbuchel, H., Segers, V. F. M., Sipido, K. R., Thienpont, B., & Roderick, H. L. (2025). **Benchmarking of computational demultiplexing methods for single-nucleus RNA sequencing data**. Briefings in bioinformatics, 26(4), bbaf371. https://doi.org/10.1093/bib/bbaf371|
|OmniDoublet|2025|Liu, L., Ren, J., Zhou, X., Cheng, X., Pan, X., Zhou, L., Lu, Y., & Liu, P. (2025). **OmniDoublet: a method for doublet detection in multimodal single-cell sequencing data**. Briefings in bioinformatics, 26(5), bbaf538. https://doi.org/10.1093/bib/bbaf538|
||2025|Ttoouli, D., & Hoffmann, D. (2025). **Multiplets in scRNA-seq data: Extent of the problem and efficacy of methods for removal**. PloS one, 20(10), e0333687. https://doi.org/10.1371/journal.pone.0333687|
|ImageDoubler|2025|Deng, K., Xu, X., Zhou, M., Li, H., Keller, E. T., Shelley, G., Lu, A., Garmire, L., & Guan, Y. (2025). **ImageDoubler: image-based doublet identification in single-cell sequencing**. Nature communications, 16(1), 21. https://doi.org/10.1038/s41467-024-55434-0|
|MRDR|2025|She, Y., Wang, C., & Zhao, Q. (2025). **Improving doublet cell removal efficiency through multiple algorithm runs**. Computational and structural biotechnology journal, 27, 451–460. https://doi.org/10.1016/j.csbj.2025.01.009|

# **scAnno**: **s**ingle-**c**ell **Anno**tation

整合多种细胞注释方法的程序，对多种注释方法的结果进行权重计算得到量化的最终结果

cell type; cell identity; cell state; cell subtype; cell phenotype


each of these methods is ultimately based on the expression of specific genes or gene sets, or general transcriptomic similarity between cells.

- **Method**
  - [Manual annotation](https://www.sc-best-practices.org/cellular_structure/annotation.html#manual-annotation)
    - From markers to cluster annotation
    - From cluster differentially expressed genes to cluster annotation
  - [Automated annotation](https://www.sc-best-practices.org/cellular_structure/annotation.html#automated-annotation)
    - Marker gene-based classifiers (CellAssign, sctype)
    - Classifiers based on a wider set of genes (CellTypist, Clustifyr, sctype)
    - Annotation by mapping to a reference (cArches [Lotfollahi et al., 2022], Symphony [Kang et al., 2021], and Azimuth (Seurat) [Hao et al., 2021], singleR)

- **Reference**
  - [Hongkui Zeng. **What is a cell type and how to define it?** Cell, 185(15):2739–2755, 2022.](https://www.sciencedirect.com/science/article/pii/S0092867422007838)

|Package|Year|Article|
|-|-|-|
|scHDeepInsight|2025|Jia, S., Lysenko, A., Boroevich, K. A., Sharma, A., & Tsunoda, T. (2025). **scHDeepInsight: a hierarchical deep learning framework for precise immune cell annotation in single-cell RNA-seq data**. Briefings in bioinformatics, 26(5), bbaf523. https://doi.org/10.1093/bib/bbaf523|
|HiCat|2025|Bi, C., Bai, K., & Zhang, X. (2025). **HiCat: a semi-supervised approach for cell type annotation**. Briefings in bioinformatics, 26(4), bbaf428. https://doi.org/10.1093/bib/bbaf428|
|scMapNet|2025|Yu, Z., Ye, Y., & Pan, J. (2025). **scMapNet: Marker-based cell type annotation of scRNA-seq data via vision transfer learning with tabular-to-image transformations**. Journal of advanced research, S2090-1232(25)00850-1. Advance online publication. https://doi.org/10.1016/j.jare.2025.10.056|
|stTransfer|2025|Zhou, T., Xiang, L., Liao, K., He, Y., Zhuang, Z., & Liu, S. (2025). **stTransfer enables transfer of single-cell annotations to spatial transcriptomics with single-cell resolution**. Cell reports methods, 101205. Advance online publication. https://doi.org/10.1016/j.crmeth.2025.101205|
|scGPT|2025|Ding, S., Li, J., Luo, R., Cui, H., Wang, B., & Chen, R. (2025). **scGPT: end-to-end protocol for fine-tuned retinal cell type annotation**. Nature protocols, 10.1038/s41596-025-01220-1. Advance online publication. https://doi.org/10.1038/s41596-025-01220-1|
|scTab|2024|Fischer, F., Fischer, D. S., Mukhin, R., Isaev, A., Biederstedt, E., Villani, A. C., & Theis, F. J. (2024). **scTab: Scaling cross-tissue single-cell annotation models**. Nature communications, 15(1), 6611. https://doi.org/10.1038/s41467-024-51059-5|
|GPT-4|2024|Hou, W., & Ji, Z. (2024). **Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis**. Nature methods, 21(8), 1462–1465. https://doi.org/10.1038/s41592-024-02235-4|
|CAME|2023|Liu, X., Shen, Q., & Zhang, S. (2023). **Cross-species cell-type assignment from single-cell RNA-seq data by a heterogeneous graph neural network**. Genome research, 33(1), 96–111. https://doi.org/10.1101/gr.276868.122|
|scATAnno|2024|Jiang, Y., Hu, Z., Lynch, A. W., Jiang, J., Zhu, A., Zeng, Z., Zhang, Y., Wu, G., Xie, Y., Li, R., Zhou, N., Meyer, C., Cejas, P., Brown, M., Long, H. W., & Qiu, X. (2024). **scATAnno: Automated Cell Type Annotation for single-cell ATAC Sequencing Data**. bioRxiv : the preprint server for biology, 2023.06.01.543296. https://doi.org/10.1101/2023.06.01.543296|
|scTransSort|2023|Jiao, L., Wang, G., Dai, H., Li, X., Wang, S., & Song, T. (2023). **scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings**. Biomolecules, 13(4), 611. https://doi.org/10.3390/biom13040611|
|TOSICA|2023|Chen, J., Xu, H., Tao, W., Chen, Z., Zhao, Y., & Han, J. J. (2023). **Transformer for one stop interpretable cell type annotation**. Nature communications, 14(1), 223. https://doi.org/10.1038/s41467-023-35923-4|
|scBalance|2023|**A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data**|
|scDeepInsight|2023|Jia, S., Lysenko, A., Boroevich, K. A., Sharma, A., & Tsunoda, T. (2023). **scDeepInsight: a supervised cell-type identification method for scRNA-seq data with deep learning**. Briefings in bioinformatics, 24(5), bbad266. https://doi.org/10.1093/bib/bbad266|
|Geneformer|2023|Theodoris, C. V., Xiao, L., Chopra, A., Chaffin, M. D., Al Sayed, Z. R., Hill, M. C., Mantineo, H., Brydon, E. M., Zeng, Z., Liu, X. S., & Ellinor, P. T. (2023). Transfer learning enables predictions in network biology. Nature, 618(7965), 616–624. https://doi.org/10.1038/s41586-023-06139-9|
|sctype|2022|Ianevski, A., Giri, A. K., & Aittokallio, T. (2022). **Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data**. Nature communications, 13(1), 1246. https://doi.org/10.1038/s41467-022-28803-w|
|scArches|2022|Lotfollahi, M., Naghipourfar, M., Luecken, M. D., Khajavi, M., Büttner, M., Wagenstetter, M., Avsec, Ž., Gayoso, A., Yosef, N., Interlandi, M., Rybakov, S., Misharin, A. V., & Theis, F. J. (2022). **Mapping single-cell data to reference atlases by transfer learning**. Nature biotechnology, 40(1), 121–130. https://doi.org/10.1038/s41587-021-01001-7|
|scBERT|2022|**scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data**. https://www.nature.com/articles/s42256-022-00534-z|
|scMRMA|2022|Li, J., Sheng, Q., Shyr, Y., & Liu, Q. (2022). **scMRMA: single cell multiresolution marker-based annotation**. Nucleic acids research, 50(2), e7. https://doi.org/10.1093/nar/gkab931|
||2021|Clarke, Z. A., Andrews, T. S., Atif, J., Pouyabahar, D., Innes, B. T., MacParland, S. A., & Bader, G. D. (2021). **Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods**. Nature protocols, 16(6), 2749–2764. https://doi.org/10.1038/s41596-021-00534-0|
||2021|Xu, C., Lopez, R., Mehlman, E., Regier, J., Jordan, M. I., & Yosef, N. (2021).** Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models**. Molecular systems biology, 17(1), e9620. https://doi.org/10.15252/msb.20209620|
|scPred|2019|Alquicira-Hernandez, J., Sathe, A., Ji, H. P., Nguyen, Q., & Powell, J. E. (2019). **scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data**. Genome biology, 20(1), 264. https://doi.org/10.1186/s13059-019-1862-5|
|SingleCellNet|2019|Tan, Y., & Cahan, P. (2019). **SingleCellNet: A Computational Tool to Classify Single Cell RNA-Seq Data Across Platforms and Across Species**. Cell systems, 9(2), 207–213.e2. https://doi.org/10.1016/j.cels.2019.06.004|
|Garnett|2019|Pliner, H. A., Shendure, J., & Trapnell, C. (2019). **Supervised classification enables rapid annotation of cell atlases**. Nature methods, 16(10), 983–986. https://doi.org/10.1038/s41592-019-0535-3|
|CaSTLe|2018|Lieberman, Y., Rokach, L., & Shay, T. (2018). **CaSTLe - Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments**. PloS one, 13(10), e0205499. https://doi.org/10.1371/journal.pone.0205499|
|scmap|2018|Kiselev, V. Y., Yiu, A., & Hemberg, M. (2018). **scmap: projection of single-cell RNA-seq data across data sets**. Nature methods, 15(5), 359–362. https://doi.org/10.1038/nmeth.4644|
|?|2014|Jaitin, D. A., Kenigsberg, E., Keren-Shaul, H., Elefant, N., Paul, F., Zaretsky, I., Mildner, A., Cohen, N., Jung, S., Tanay, A., & Amit, I. (2014). **Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types**. Science (New York, N.Y.), 343(6172), 776–779. https://doi.org/10.1126/science.1247651|
||||


# Integration of multiple datas

- Zhong, H., Han, W., Gomez-Cabrero, D., Tegner, J., Gao, X., Cui, G., & Aranda, M. (2025). **Benchmarking cross-species single-cell RNA-seq data integration methods: towards a cell type tree of life**. Nucleic acids research, 53(1), gkae1316. https://doi.org/10.1093/nar/gkae1316



|Package|Year|Article|
|-|-|-|
|STACAS|2024|Andreatta, M., Hérault, L., Gueguen, P., Gfeller, D., Berenstein, A. J., & Carmona, S. J. (2024). Semi-supervised integration of single-cell transcriptomics data. Nature communications, 15(1), 872. https://doi.org/10.1038/s41467-024-45240-z|