Skip to content

A Snakemake tutorial workflow using the recent ABRF beer proteomics study.

License

Notifications You must be signed in to change notification settings

wfondrie/snakemake-beer-proteomics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Snakemake Example with Beer Proteomics

This is an example of a Snakemake workflow that I put together for the Noble lab in late 2021. This workflow analyses proteomics data from four different beers using data that was part of "The 2020 ABRF Beer Study: beer proteomics at the global scale" (MSV000088080).

It specifically performs the following steps:

  1. Downloads the four mass spectrometry data files from MassIVE using ppx.
  2. Downloads an appropriate beer FASTA file consisting of the yeast, barley, wheat, and hops verified UniProt proteomes.
  3. Converts the raw mass spectrometry data files to an open format (mzML) using ThermoRawFileParser.
  4. Searches each of the data files against the beer FASTA file using Comet.
  5. Refines the search results with mokapot using a joint model.
  6. Creates a plot showing the number of PSMs, peptides, and proteins from each.

Setup

1. Prerequisites

This repository includes a conda environment that is compatible with MacOS and Linux systems. First, if you'll need a working conda installation. If you need to install one, I recommend miniconda. You'll also need git to clone this repository, which can be installed using conda:

conda install git

2. Clone this repository

With conda installed, you should first clone this repository:

git clone https://github.com/wfondrie/snakemake-beer-proteomics.git

Then enter it:

cd snakemake-beer-proteomics

3. Create and activate the conda environment

Create the conda environment:

conda env create --prefix ./envs -f environment.yaml

Activate the conda environment:

conda activate ./envs

Run the workflow

To run this workflow on your local machine using all available cores:

snakemake --cores all

When you run this workflow for the first time, snakemake organize jobs into the directed acyclic graph (DAG) below. During execution, independent jobs are conducted in parallel while dependent jobs wait for their dependencies to become available.

The DAG for this workflow

To run this workflow on the Noble lab SGE cluster:

snakemake --cores all --profile sge --use-conda

Note, you should ideally encapsulate this command into its own job, rather than running it on the head node.

Expected results

Once the workflow has completed, you should find that it created results/figures/detections.png. The figure should look like this:

The expected results.

Repository organization

This is an overview of how this repository is organized after the workflow has been executed.

snakemake-beer-proteomics
|- Snakefile          # The instructions for Snakemake
|
|- data               # The downloaded data.
|  |- raw             # The Thermo raw files.
|  |- mzML            # The mzML files
|  `- fasta           # The FASTA files.
|
|- results            # Results from Comet, mokapot, and the final figure.
|  |- comet           # The comet results.
|  |- mokapot         # The mokapot results.
|  `- figures         # The final figure.
|
|- scripts            # The scripts used during the analysis.
|  `- make_figure.py  # The script to create the final figure.
|
|- profiles           # Profiles for cluster jobs.
|  `- sge             # A basic SGE profile, tailored for UWGS.
|     `- config.yaml  # The configuration file that tells snakemake how to 
|                     #   submit jobs to the cluster and what resources we
|                     #   can specify.
|
|- params             # Parameter files.
|  `- comet.params    # The Comet search parameters. 
|
|- static             # Static assets for things like the README
|  `- dag.png         # The DAG for this workflow.
|
|- logs               # Log files from the various steps of the pipelne.
|- envs               # The installed conda environement.
|- job.sh             # An example SGE job script to run the workflow.
|- README.md          # This file.
`- LICENSE            # MIT.

About

A Snakemake tutorial workflow using the recent ABRF beer proteomics study.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published