This repository contains the scripts allowing to reproduce the figures from
Christophe Vanderaa and Laurent Gatto, The current state of single-cell proteomics data analysis, 2022 arXiv, doi:10.48550/ARXIV.2210.01020.
The main messages of the paper are:
- Data analysis plays a central role in science. The field of single-cell proteomics (SCP) has experienced an explosion of technical workflows, but its data analysis practices are lagging behind.
- Too little attention is given to computational analyses and the current efforts are not focused on quantitative data processing.
- Quantitative data processing workflows are a sequence of processing steps, and this sequence influences the analysis.
- Every SCP publication comes with a different computational pipeline that are fundamentally different.
- There is a lack of consensus, artefacts are propagated to the protein data, and current methods are taken from bulk proteomics.
- The solution is benchmarking, but it requires standardized computational tools and well-designed data.
scp
andsceptre
are the only two computational environments that are flexible and designed for SCP.scpdata
offers access to published data, but rigorous benchmark data is still missing.- Replication of current SCP studies increases the trust in the results, highlights issues and remaining challenges, and provides didactic material.
- Different workflows lead to different results, but controlled designs are needed for measuring the performance.
- Lessons learned from replication of SCP data analysis: (1) A good analysis requires good tools. (2) Consistent input formats facilitate data analysis. (3) Beware of confounding effects - because SCP requires large datasets, confounding effects become inevitable and need to be correctly modelled.
- We need to harmonize the workflows now before more analyses propagate current bad practices.
The results are based on the the SCP replications documented in SCP.Replication using the scp R/Bioconductor package and the data in scpdata.
The figures were generated in R.
- Figure 1: Overview of quantitative data processing workflows. This figure is
generated by
scripts/make_figure1.R
- Figure 2: Impact of quantitative data processing workflows on the Schoof et
al. 2021 dataset. This figure is generated by
scripts/make_figure2and3.R
- Figure 3: Impact of quantitative data processing workflows on the Specht et
al. 2021 dataset. This figure is generated by
scripts/make_figure2and3.R
- Figure 4: Confounding effects cause undesired variability. This figure is
generated by
scripts/make_figure4.R
. Note you must first runscripts/make_figure2and3.R
as the Figure 4 relies on results generated for Figure 3.
The figures can be reproduced using the
bioconductor/bioconductor_docker:RELEASE_3_16
image. Get the image
running:
docker pull bioconductor/bioconductor_docker:RELEASE_3_16
Change directory (cd
) to your local clone of this repository and
start the container using:
docker run -e PASSWORD=scp \
-v `pwd`:/home/rstudio/paper/ \
-it bioconductor/bioconductor_docker:RELEASE_3_16 \
bash
The -v
enables you to get access to the scripts within the container
and to store your images locally.
In order to run the scripts to reproduce the figures, you'll need to run the following steps:
- Install and configure python dependencies (for running
sceptre
):
RUN apt-get update \
## Install the python package sceptre
&& pip install git+https://github.com/bfurtwa/Sceptre.git@818a8914fe87788642f9b0dcdb49991ba8a4506a \
## Install specific version of NumPy to solve dependency issues, and install
## other python dependencies
&& pip install NumPy==1.22 IPython leidenalg \
## Remove packages in '/var/cache/' and 'var/lib'
## to remove side-effects of apt-get update
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
## Switch to libblas for better integration of python in reticulate.
&& ARCH=$(uname -m) \
&& update-alternatives --set "libblas.so.3-${ARCH}-linux-gnu" "/usr/lib/${ARCH}-linux-gnu/blas/libblas.so.3" \
&& update-alternatives --set "liblapack.so.3-${ARCH}-linux-gnu" "/usr/lib/${ARCH}-linux-gnu/lapack/liblapack.so.3"
- Change directory and open an R session:
cd /home/rstudio/paper/
R
- Finally, install the following R packages:
BiocManager::install(c("tidyverse", "patchwork", "RColorBrewer",
"scpdata", "scp", "biomaRt", "scater", "scran",
"igraph", "viridis", "cluster", "scuttle",
"reticulate", "zellkonverter", "scuttle",
"UCLouvain-CBIO/SCP.replication"))
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.