Skip to content

UCLouvain-CBIO/2022-scp-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

This repository contains the scripts allowing to reproduce the figures from

Christophe Vanderaa and Laurent Gatto, The current state of single-cell proteomics data analysis, 2022 arXiv, doi:10.48550/ARXIV.2210.01020.

Key messages

The main messages of the paper are:

  • Data analysis plays a central role in science. The field of single-cell proteomics (SCP) has experienced an explosion of technical workflows, but its data analysis practices are lagging behind.
  • Too little attention is given to computational analyses and the current efforts are not focused on quantitative data processing.
  • Quantitative data processing workflows are a sequence of processing steps, and this sequence influences the analysis.
  • Every SCP publication comes with a different computational pipeline that are fundamentally different.
  • There is a lack of consensus, artefacts are propagated to the protein data, and current methods are taken from bulk proteomics.
  • The solution is benchmarking, but it requires standardized computational tools and well-designed data.
  • scp and sceptre are the only two computational environments that are flexible and designed for SCP.
  • scpdata offers access to published data, but rigorous benchmark data is still missing.
  • Replication of current SCP studies increases the trust in the results, highlights issues and remaining challenges, and provides didactic material.
  • Different workflows lead to different results, but controlled designs are needed for measuring the performance.
  • Lessons learned from replication of SCP data analysis: (1) A good analysis requires good tools. (2) Consistent input formats facilitate data analysis. (3) Beware of confounding effects - because SCP requires large datasets, confounding effects become inevitable and need to be correctly modelled.
  • We need to harmonize the workflows now before more analyses propagate current bad practices.

The results are based on the the SCP replications documented in SCP.Replication using the scp R/Bioconductor package and the data in scpdata.

Figures

The figures were generated in R.

  • Figure 1: Overview of quantitative data processing workflows. This figure is generated by scripts/make_figure1.R
  • Figure 2: Impact of quantitative data processing workflows on the Schoof et al. 2021 dataset. This figure is generated by scripts/make_figure2and3.R
  • Figure 3: Impact of quantitative data processing workflows on the Specht et al. 2021 dataset. This figure is generated by scripts/make_figure2and3.R
  • Figure 4: Confounding effects cause undesired variability. This figure is generated by scripts/make_figure4.R. Note you must first run scripts/make_figure2and3.R as the Figure 4 relies on results generated for Figure 3.

Reproduction with Docker

The figures can be reproduced using the bioconductor/bioconductor_docker:RELEASE_3_16 image. Get the image running:

docker pull bioconductor/bioconductor_docker:RELEASE_3_16

Change directory (cd) to your local clone of this repository and start the container using:

docker run -e PASSWORD=scp \
		-v `pwd`:/home/rstudio/paper/ \
		-it bioconductor/bioconductor_docker:RELEASE_3_16 \
    bash

The -v enables you to get access to the scripts within the container and to store your images locally.

In order to run the scripts to reproduce the figures, you'll need to run the following steps:

  • Install and configure python dependencies (for running sceptre):
RUN apt-get update \
	## Install the python package sceptre
	&& pip install git+https://github.com/bfurtwa/Sceptre.git@818a8914fe87788642f9b0dcdb49991ba8a4506a \
    ## Install specific version of NumPy to solve dependency issues, and install 
	## other python dependencies
    && pip install NumPy==1.22 IPython leidenalg \ 
	## Remove packages in '/var/cache/' and 'var/lib'
	## to remove side-effects of apt-get update
	&& apt-get clean \
	&& rm -rf /var/lib/apt/lists/* \
	## Switch to libblas for better integration of python in reticulate. 
	&& ARCH=$(uname -m) \
	&& update-alternatives --set "libblas.so.3-${ARCH}-linux-gnu" "/usr/lib/${ARCH}-linux-gnu/blas/libblas.so.3" \
	&& update-alternatives --set "liblapack.so.3-${ARCH}-linux-gnu" "/usr/lib/${ARCH}-linux-gnu/lapack/liblapack.so.3"
  • Change directory and open an R session:
cd /home/rstudio/paper/
R
  • Finally, install the following R packages:
BiocManager::install(c("tidyverse", "patchwork", "RColorBrewer", 
                       "scpdata", "scp", "biomaRt", "scater", "scran",
                       "igraph", "viridis", "cluster", "scuttle", 
                       "reticulate", "zellkonverter", "scuttle", 
                       "UCLouvain-CBIO/SCP.replication"))

Licence

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About

The current state of single-cell proteomics data analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages