Skip to content

Models and datasets for perturbational single-cell omics

License

Notifications You must be signed in to change notification settings

theislab/sc-pert

Repository files navigation

sc-pert - Machine learning for perturbational single-cell omics

This repository provides a community-maintained summary of models and datasets. It was initially curated for (Cell Systems, 2021).

External annotations

There are various resources for evaluation of single cell perturbation models. We discuss five tasks in the publication which can be supported by the following publicly available annotations:

  • GDSC provides a collection of cell viability measurements for many compounds and cell lines. We provide a code snippet to conveniently load GDSC-provided z-score compound response rankings per cell line.
  • Additional viability data can be obtained from DepMap's PRISM dataset.
  • Therapeutics Data Commons provides access to a number of compound databases as part of their cheminformatics tasks. (In the same vein, OpenProblems provides a framework for tasks in single-cell which can also support perturbation modeling tasks in a more long term format than was previously seen in the DREAM challenges.)
  • PubChem contains a comprehensive record of compounds ranging from experimental entities to non-proprietary small molecules. It is queryable via PubChemPy.
  • DrugBank provides annotations for a relatively small number of small molecules in a standardized format.

Current modeling approaches

We maintain a list of perturbation-related tools at scrna-tools. Please consider further updating and tagging tools there.

For the basis of the table in the article, see this spreadsheet of a subset of perturbation models which includes more details.

Datasets

Below, we curated a table of perturbation datasets based on Svensson et al. (2020).

We also offer some datasets in a curated .h5ad format via the download links in the table below. raw h5ad denotes a version of the dataset that has not been filtered, normalized, or standardized.

H5ads denoted as processed have an accompanying processing notebook, and have been similarly preprocessed. These datasets have the following standardized fields in .obs:

  • perturbation_name -- Human-readable ompound names (International non-proprietary naming where possible) for small molecules and gene names for genetic perturbations.
  • perturbation_type -- small molecule or genetic
  • perturbation_value -- A continuous covariate quantity, such as the dosage concentration or the number of hours since treatment.
  • perturbation_unit -- Describes perturbation_value, such as 'ug' or 'hrs'.
Shorthand Title                                                                       .h5ad availability Treatment # perturbations # cell types # doses # timepoints Reported cells total Organism Tissue Technique Data location Panel size Measurement Cell source Disease Contrasts Developmental stage Number of reported cell types or clusters Cell clustering Pseudotime RNA Velocity PCA tSNE H5AD location Isolation BC --> Cell ID OR BC --> Cluster ID Number individuals
Jaitin et al. Science Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types CRISPR 8-22 1 - 1 4,468 Mouse Spleen MARS-seq GSE54006 nan RNA-seq CD11c+ enriched splenocytes nan nan nan 9 Yes No nan No No nan Sorting (FACS) nan nan
Dixit et al. Cell Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens [raw h5ad] [processed h5ad] [processing nb] CRISPR 10,24 1 - 1-2 200,000 Human, Mouse Culture Perturb-seq GSE90063 nan RNA-seq BMDCs, K562 nan nan nan nan nan nan nan nan No nan Nanodroplet dilution nan nan
Datlinger et al. NMeth Pooled CRISPR screening with single-cell transcriptome readout CRISPR 29 1 - 1 5,905 Human, Mouse Culture CROP-seq GSE92872 nan RNA-seq HEK293T, 3T3, Jurkat nan nan nan nan nan nan nan nan No nan nan nan nan
Hill et al. NMethods On the design of CRISPR-based single-cell molecular screens CRISPR 32 1 2 1 5,879 Human Culture CROP-seq GSE108699 nan RNA-seq MCF10a cells nan nan nan nan nan nan nan nan nan nan nan https://github.com/shendurelab/single-cell-ko-screens#result-files nan
Ursu et al. bioRxiv Massively parallel phenotyping of variant impact in cancer with Perturb-seq reveals a shift in the spectrum of cell states induced by somatic mutations CRISPR 200 1 - 1 162,314 Human Lung Perturb-seq nan nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Jin et al. Science In vivo Perturb-Seq reveals neuronal and glial abnormalities associated with autism risk genes CRISPR 35 - - 1 46,770 Mouse Brain Perturb-seq nan nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Frangieh et al. NGenetics Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion [raw h5ad] [processed h5ad] [processing nb] CRISPR 248 1 - 1 218,331 Human Culture Perturb-CITE-seq SCP1064 nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Papalexi et al. NGenetics Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens CRISPR 111 (sgRNA) 1 2 - 28,295 Human Culture CITE-seq & ECCITE-seq GSE153056 nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Datlinger et al. NMethods Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing CRISPR KO + antibody 96 1 1 1 nan Human, Mouse nan scifi-RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Alda-Catalinas et al. CSystems A Single-Cell Transcriptomics CRISPR-Activation Screen Identifies Epigenetic Regulators of the Zygotic Genome Activation Program CRISPRa 230 1 - - 203,894 Mouse Culture Chromium nan nan RNA-seq mESCs nan nan nan nan nan nan nan nan nan nan nan nan nan
Norman et al. (2019) nan [raw h5ad] [processed h5ad] [curation nb] [processing nb] CRISPRa 287 1 - 1 nan nan nan Perturb-seq nan nan RNA-seq induction of gene pair targets+single gene controls in K562 cells after screening 112 genes (2x gRNA per) and their combinations nan nan nan nan nan nan nan nan nan nan nan nan nan
Adamson et al. Cell A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response CRISPRi 9-93 (sgRNA) 1 - 1 86,000 Human Culture Perturb-seq GSE90546 nan RNA-seq K562 nan nan nan nan nan nan nan nan Yes nan nan nan nan
Gasperini et al. Cell A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens CRISPRi 1119, 5779 1 - 1 207,324 Human Culture CROP-seq GSE120861 nan RNA-seq K562 Cells nan CRISPR Screen nan nan nan nan nan nan nan nan nan nan nan
Jost et al. NBT Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs CRISPRi 25 2 - 1 19,587 Human Culture Perturb-seq GSE132080 nan RNA-seq K562 cells nan 25 gene screen nan nan nan nan nan nan nan nan nan nan nan
Schraivogel et al. NMethods Targeted Perturb-seq enables genome-scale genetic screens in single cells [processing nb] CRISPRi 1778 (enhancers) 1 - 1 231,667 Human, Mouse Bone marrow, Culture TAP-seq GSE135497 1,000 RNA-seq nan nan nan nan nan nan nan nan nan Yes nan nan nan nan
Leng et al. bioRxiv CRISPRi screens in human astrocytes elucidate regulators of distinct inflammatory reactive states CRISPRi 30 1 2 - nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Replogle et al. (2020) nan genetic targets nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Replogle et al. (2021) nan genetic targets >10000 2 - - nan nan nan Perturb-seq nan nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Shin et al. SAdvances Multiplexed single-cell RNA-seq via transient barcoding for simultaneous expression profiling of various drug perturbations small molecules 45 2 1 1 3,091 Mouse, Human Culture Drop-seq PRJNA493658 nan RNA-seq HEK293T, NIIH3T3, A375, SW480, K562 nan 45 perturbations nan nan nan nan nan nan nan nan nan nan nan
Srivatsan et al. Science Massively multiplex chemical transcriptomics at single-cell resolution [raw h5ad] [curation nb] [curation nb] [processing nb] small molecules 188 3 4 2 650,000 Human Culture sci-Plex GSE139944 nan RNA-seq Cancer cell lines A549, K562, and MCF7 nan 5,000 drug conditions nan 3 Yes Yes No Yes No nan nan nan nan
Zhao et al. bioRxiv Deconvolution of Cell Type-Specific Drug Responses in Human Tumor Tissue with Single-Cell RNA-seq small molecules 2,6 6,1 - - 48,404 Human Brain, Tumor SCRB-seq (microwell) GSE148842 nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan 6
McFarland et al. NCommunications Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action [curation nb] [processing nb] small molecules 1-13 24-99 1 1-5 nan Human Culture MIX-seq nan nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Chen et al. (2020) nan small molecules 300 1 1 1 nan nan nan CyTOF nan nan protein breast cancer cells undergoing TGF-β-induced EMT nan nan nan nan nan nan nan nan nan nan nan nan nan