-
I received the following question as a Twitter DM: Hi Will,
I get the same error on my Mac OS or in Windows 10. I made the FASTA with concatenated reversed decoys using the function in mokapot. I see accessions that start with “decoy_” in the pin file. I am using a clean Python 3.9 environment in Anaconda. Any idea what might be up? Since this is my first try, I did not want to post on the discussion site. It might be something dumb that I am doing. I noticed that the command line use allows specifying a decoy prefix (I usually have “REV_” in my FASTA files) but did not see an option to set that in the Python API. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
It looks like all values of the "Label" column in the pin file are 1, indicating all of the PSMs are targets. This is the column that is used by mokapot (and Percolator) to distinguish between target and decoy PSMs. Please indicate decoy PSMs with a -1 and it should work. As for the decoy prefix, this is only used for protein-level FDR estimates. To do the "picked-protein approach", mokapot needs to be able to pair target proteins to their decoy counterparts in the FASTA file, which is where this column comes into play - it has no effect on PSM or peptide level FDR estimates. To specify a different decoy prefix in the Python API, use the |
Beta Was this translation helpful? Give feedback.
-
Followup question from Twitter:
|
Beta Was this translation helpful? Give feedback.
-
Alas, it may be necessary when users are not using data that originates from a protein FASTA file---for instance from a spectral library search, or a database of peptides. The necessary columns for files in the "Percolator input format" are describe in the mokapot docs here, and in the Percolator docs here.
When reading the PepXML file, the |
Beta Was this translation helpful? Give feedback.
It looks like all values of the "Label" column in the pin file are 1, indicating all of the PSMs are targets. This is the column that is used by mokapot (and Percolator) to distinguish between target and decoy PSMs. Please indicate decoy PSMs with a -1 and it should work.
As for the decoy prefix, this is only used for protein-level FDR estimates. To do the "picked-protein approach", mokapot needs to be able to pair target proteins to their decoy counterparts in the FASTA file, which is where this column comes into play - it has no effect on PSM or peptide level FDR estimates.
To specify a different decoy prefix in the Python API, use the
decoy_prefix
argument in http://mokapot.read_fasta():