Pathogen Identification Direct from Polymicrobial Specimens Using Membrane Glycolipids

William E Fondrie, January 18, 2018


This repository contains the code required to reproduce the analysis presented in Pathogen Identification Direct from Polymicrobial Specimens Using Membrane Glycolipids. The data required for this analysis ( below) can be downloaded from here. We are currently in the process of working with the UMB Office of Technology Transfer to provide a download link. In the interim, please contact and the data will be provided to you promptly.

WARNING: When attempting to reproduce this analysis, it should be noted that Rmd/simulateCompleSpectra.Rmd requires approximately 36 Gb of memory to execute. Additionally, the entire analysis time may take a number of hours depending on the hardware.

Order of analysis (see the Rmd subdirectory for details):

  1. modelTraining.Rmd
  2. simulateComplexSpectra.Rmd
  3. mixtureAnalysis.Rmd
  4. evaluateModels.Rmd
  5. baselineClassifiers.Rmd
  6. makeMiscFigures.Rmd
  7. illustrations.Rmd
  8. makeTables.Rmd

Directory structure

To reproduce this analysis, the project directory must be structured as follows:

|- data
|  `-
|- R
|  |- createNewFeatureTbl.R
|  |- ggplotTheme.R
|  |- utilityFunctions.R
|  |- prepareData.R
|  |- extract.R
|  `- preProcessSpec.R
`- Rmd
   |- baselineClassifiers.Rmd
   |- evaluateModels.Rmd
   |- illustrations.Rmd
   |- makeMiscFigures.Rmd
   |- mixtureAnalysis.Rmd
   |- modelTraining.Rmd
   |- simulateComplexSpectra.Rmd
   `- writeTables.Rmd

Install necessary packages

This analysis uses a number of R packages. To ensure that all of them are installed on your machine, you can execute the following:


Code to run the analysis

With a correctly prepared directory, the entire analysis can be run with a single line of code. If it is your fist time running the analysis, you can unzip the data/ into the data directory manually, or change unzipData to TRUE in the next section.

To run the analysis:

rmarkdown::render("path/to/README.Rmd", envir = new.env())

When this rmarkdown file is rendered, the following code is executed:

Prepare workspace

unzipData <- FALSE # change to "TRUE" to unzip "" in the data directory

if(unzipData) {
    unzip("data/", overwrite = T, exdir = "../data")


# Make results, temp and mdOutput directories:

# Set global parameters for analysis
mzTol <- 1.5 # m/z tolerance for feature extraction in Da
saveRDS(mzTol, "temp/mzTol.rds")

Run Analysis Scripts

The runAnalysis() function takes a list of .Rmd files and knits them into GitHub compatible markdown files.

runAnalysis <- function(rmd) {
           envir = new.env(),
           output_dir = "Rmd/mdOutput")

files <- c("modelTraining", # Trains xgboost models
           "simulateComplexSpectra", # simulates polymicrobial spectra
           "mixtureAnalysis", # Imports spectra from experimental two-species
           "baselineClassifiers", # Implements simple baseline classifiers
           "evaluateModels", # Calculate performance metrics and makes figures
           "makeMiscFigures", # Creates additional figures
           "illustrations", # Creates additional figures for illustrations
           "writeTables") # Creates supplementarly Tables

files <- paste0("Rmd/", files, ".Rmd")

Now run the analysis:


