Skip to content

Latest commit

 

History

History
124 lines (92 loc) · 3.01 KB

index.rst

File metadata and controls

124 lines (92 loc) · 3.01 KB

Python API

The Python API enables maximum flexibility when using mokapot. It also aids in making analyses reproducible by easily integrating into Jupyter notebooks and Python scripts.

Read PSMs using the :py~mokapot.read_pin() or :py~mokapot.read_pepxml() functions for files in the Percolator tab-delimited format or PepXML format, respectively. Once a collection of PSMs has been read, the :py~mokapot.brew() function will apply the mokapot algorithm to learn models from the PSMs and assign confidence estimates based on their new scores. Alternatively, the :py~mokapot.dataset.LinearPsmDataset.assign_confidence() method will assign confidence estimates to PSMs based on the best feature, which is often the primary score from the database search engine.

Alternatively, PSMs that are already represented in a :pypandas.DataFrame can be directly used to create a :py~mokapot.dataset.LinearPsmDataset.

Finally, custom machine learning models can be created using the :pymokapot.model.Model class.

Overview <self> functions.rst model.rst dataset.rst confidence.rst proteins.rst

Functions

mokapot

Primary Functions

read_pin read_pepxml read_fasta brew to_txt to_flashlfq

Utility Functions

save_model load_model read_percolator plot_qvalues make_decoys digest

Machine Learning Models

Use a model that emulates the Linear support vector machine used by Percolator or create a custom model from anything with a Scikit-Learn interface.

mokapot.model

PercolatorModel Model

Collections of PSMs

PSMs can be parsed from Percolator tab-delimited files, PepXML files, or directly from a :pypandas.DataFrame.

mokapot.dataset

LinearPsmDataset .. CrossLinkedPsmDataset

Confidence Estimates

An analysis with mokapot yields two forms of confidence estimates---q-values and posterior error probabilities (PEPs)---at various levels: PSMs, peptides, and optionally, proteins.

mokapot.confidence

LinearConfidence .. CrossLinkedConfidence

Protein Sequences

To calculate protein-level confidence estimates, mokapot needs the original protein sequences and digestion parameters used for the database search. These are created using the :pymokapot.read_fasta() function, which return a :pyProteins object. :pyProteins objects store the mapping of peptides to the proteins that may have generated them and the mapping of target protein sequences to their corresponding decoys.

mokapot.proteins

Proteins