A Snakemake Example with Beer Proteomics

This is an example of a Snakemake workflow that I put together for the Noble lab in late 2021. This workflow analyses proteomics data from four different beers using data that was part of "The 2020 ABRF Beer Study: beer proteomics at the global scale" (MSV000088080).

It specifically performs the following steps:

Downloads the four mass spectrometry data files from MassIVE using ppx.
Downloads an appropriate beer FASTA file consisting of the yeast, barley, wheat, and hops verified UniProt proteomes.
Converts the raw mass spectrometry data files to an open format (mzML) using ThermoRawFileParser.
Searches each of the data files against the beer FASTA file using Comet.
Refines the search results with mokapot using a joint model.
Creates a plot showing the number of PSMs, peptides, and proteins from each.

Setup

1. Prerequisites

This repository includes a conda environment that is compatible with MacOS and Linux systems. First, if you'll need a working conda installation. If you need to install one, I recommend miniconda. You'll also need git to clone this repository, which can be installed using conda:

conda install git

2. Clone this repository

With conda installed, you should first clone this repository:

git clone https://github.com/wfondrie/snakemake-beer-proteomics.git

Then enter it:

cd snakemake-beer-proteomics

3. Create and activate the conda environment

Create the conda environment:

conda env create --prefix ./envs -f environment.yaml

Activate the conda environment:

conda activate ./envs

Run the workflow

To run this workflow on your local machine using all available cores:

snakemake --cores all

When you run this workflow for the first time, snakemake organize jobs into the directed acyclic graph (DAG) below. During execution, independent jobs are conducted in parallel while dependent jobs wait for their dependencies to become available.

To run this workflow on the Noble lab SGE cluster:

snakemake --cores all --profile sge --use-conda

Note, you should ideally encapsulate this command into its own job, rather than running it on the head node.

Expected results

Once the workflow has completed, you should find that it created results/figures/detections.png. The figure should look like this:

Repository organization

This is an overview of how this repository is organized after the workflow has been executed.

snakemake-beer-proteomics
|- Snakefile          # The instructions for Snakemake
|
|- data               # The downloaded data.
|  |- raw             # The Thermo raw files.
|  |- mzML            # The mzML files
|  `- fasta           # The FASTA files.
|
|- results            # Results from Comet, mokapot, and the final figure.
|  |- comet           # The comet results.
|  |- mokapot         # The mokapot results.
|  `- figures         # The final figure.
|
|- scripts            # The scripts used during the analysis.
|  `- make_figure.py  # The script to create the final figure.
|
|- profiles           # Profiles for cluster jobs.
|  `- sge             # A basic SGE profile, tailored for UWGS.
|     `- config.yaml  # The configuration file that tells snakemake how to 
|                     #   submit jobs to the cluster and what resources we
|                     #   can specify.
|
|- params             # Parameter files.
|  `- comet.params    # The Comet search parameters. 
|
|- static             # Static assets for things like the README
|  `- dag.png         # The DAG for this workflow.
|
|- logs               # Log files from the various steps of the pipelne.
|- envs               # The installed conda environement.
|- job.sh             # An example SGE job script to run the workflow.
|- README.md          # This file.
`- LICENSE            # MIT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Snakemake Example with Beer Proteomics

Setup

1. Prerequisites

2. Clone this repository

3. Create and activate the conda environment

Run the workflow

Expected results

Repository organization

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
logs		logs
params		params
profiles/sge		profiles/sge
results		results
scripts		scripts
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
environment.yaml		environment.yaml
job.sh		job.sh

License

wfondrie/snakemake-beer-proteomics

Folders and files

Latest commit

History

Repository files navigation

A Snakemake Example with Beer Proteomics

Setup

1. Prerequisites

2. Clone this repository

3. Create and activate the conda environment

Run the workflow

Expected results

Repository organization

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages