Mass Spectrometry analysis using Empirical and Replicate based statistics.
MS-EmpiRe is a R package for quantitative analyses of Mass Spectrometry proteomics data. It allows highly sensitive and specific identitification of differentially abundant proteins between different experimental conditions.
MS-EmpiRe requires the R package
Biobase from Bioconductor.
Biobase can be installed from the R command line using the following
You can install MS-EmpiRe directly from github using the R package
example.R shows an example analysis workflow for simple table input data. The first column of the table contains the peptide/protein id, which is encoded as follows:
proteinID.peptideID. The remaining columns contain the measurements for replicate samples from two conditions.
MS-EmpiRe currently offers the following two functions to read data of your quantitative proteomics setup:
read.standard(table, sample.mapping, signal_pattern, prot.id.col, prot.id.generator)for simple tables
read.MaxQuant(peptides, sample.mapping)for output generated by MaxQuant
Both functions return an
ExpressionSet object (part of the
Biobase package) which can be used for further analysis.
table has to be a table containing one row per peptide. Each row has to contain at least the measured signals for each sample/replicate. Any additional columns will be stored in the feature data slot of the
signal_pattern has to be a regular expression that only matches columns that contain measurements. Either
prot.id.generator can be used to determine the peptide to protein mapping.
prot.id.generator should be a lambda expression that allows to extract the protein id from the peptide id column (e.g. if the peptide ids follow the pattern
proteinID.peptideID like in the example).
prot.id.col has to be a column that already contains the protein id for each peptide.
sample.mapping has to be a table containing two columns, named
condition. It is used to determine which samples are replicates for which condition.
read.MaxQuant currently does not generate a peptide to protein mapping since we want to use the mapping from the proteinGroups.txt MaxQuant output (see Data filtering).
MS-EmpiRe draws its power from replicate measurements. We therefore suggest to remove peptides which were not measured in multiple replicates per condition. With the function
filter_detection_rate(data, rate=2) one can remove all peptides which were detected in less than
rate replicates per conditions.
If the input data was generated by MaxQuant and read by
read.MaxQuant, we suggest to additionally use the function
filter_MaxQuant(data, proteinGroups). It requires the proteinGroups.txt file which is usually generated by MaxQuant. Based on this file, the peptide to protein mappings is created. Furthermore, proteins with undesired features like "reverse" or "contaminant" are removed.
To correct for sample specific biases, MS-EmpiRe ships with a normalization method that minimizes the changes between replicate measurements for each peptide. A more detailed description of the method can be found in
. It can be accessed using the function
data has to be an
ExpressionSet type object (preferably from one of the two input generation functions, see section Reading Input). If
out.dir has a value different from
NULL, MS-EmpiRe will create detailed plots for the data (before and after normalization) inside
out.dir. The returned object is an
ExpressionSet that contains the normalized values in the
For the detection of differential proteins, run the function
data is the
ExpressionSet after filtering and normalization.
de.ana returns a data frame with one row per protein. Protein ID's can be accessed from the column
prot.id. The p-value after outlier corrections (see
) is named
p.val, the corresponding value after multiple testing correction (Benjamini-Hochberg) is named
p.adj. The columns
prot.p.adj contain the respective values before outlier correction. The protein level (log2) fold change estimate is named
MS-EmpiRe is released under the GNU Affero General Public License. See LICENSE for further details.
 Ammar, C.*, Gruber, M.*, Csaba, G.*, Zimmer, R. (2019). MS-EmpiRe Utilizes Peptide-level Noise Distributions for Ultra-sensitive Detection of Differentially Expressed Proteins. Mol. Cell Proteomics, 18(9), 1880-92. doi:10.1074/mcp.RA119.001509