Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'X' is not a valid unambiguous letter for protein #556

Closed
lydiayliu opened this issue Aug 25, 2022 · 3 comments · Fixed by #559
Closed

'X' is not a valid unambiguous letter for protein #556

lydiayliu opened this issue Aug 25, 2022 · 3 comments · Fixed by #559
Assignees
Labels
priority: now Issue to be fixed immediately

Comments

@lydiayliu
Copy link
Collaborator

lydiayliu commented Aug 25, 2022

I feel like we've seen this error before but I can't believe I'm seeing it again under this circumstance lol

I am trying to use the new lysN enzyme to call noncoding peptides

moPepGen callNoncoding \
    --genome-fasta /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/ref/GRCh38-EBI-GENCODE34/GRCh38.p13.genome.fa \
    --annotation-gtf /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/ref/GRCh38-EBI-GENCODE34/gencode.v34.chr_patch_hapl_scaff.annotation.gtf \
    --proteome-fasta /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/ref/GRCh38-EBI-GENCODE34/gencode.v34.pc_translations.fa \
    --output-path /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/ref/GRCh38-EBI-GENCODE34/noncoding_peptides/lysn_m0_peptides.fa \
    --cleavage-rule lysn \
    --miscleavage 0 \
    --min-mw 0 \
    --min-tx-length 0 \
    --min-length 0 \
    --max-length 99

Got the following error

[ 2022-08-25 21:44:26 ] moPepGen callNoncoding started
[ 2022-08-25 21:46:08 ] Annotation GTF loaded.
[ 2022-08-25 21:46:09 ] Proteome FASTA loaded.
[ 2022-08-25 21:46:28 ] Genome assembly FASTA loaded.
[ 2022-08-25 21:46:50 ] canonical peptide pool generated.
[ 2022-08-25 21:48:04 ] 5000 transcripts processed.
[ 2022-08-25 21:49:29 ] 10000 transcripts processed.
[ 2022-08-25 21:50:52 ] 15000 transcripts processed.
[ 2022-08-25 21:52:21 ] 20000 transcripts processed.
[ 2022-08-25 21:53:55 ] 25000 transcripts processed.
[ 2022-08-25 21:55:29 ] 30000 transcripts processed.
[ 2022-08-25 21:57:03 ] 35000 transcripts processed.
[ 2022-08-25 21:58:19 ] 40000 transcripts processed.
[ 2022-08-25 21:59:56 ] 45000 transcripts processed.
[ 2022-08-25 22:01:39 ] 50000 transcripts processed.
[ 2022-08-25 22:02:58 ] 55000 transcripts processed.
[ 2022-08-25 22:03:01 ] Exception raised from ENST00000625083.1
Traceback (most recent call last):
  File "/usr/local/bin/moPepGen", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 89, in main
    args.func(args)  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_noncoding_peptide.py", line 115, in call_noncoding_peptide
    peptides, orfs = call_noncoding_peptide_main(tx_id, tx_model, genome,
  File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_noncoding_peptide.py", line 171, in call_noncoding_peptide_main
    peptides = pgraph.call_variant_peptides(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 763, in call_variant_peptides
    self.call_and_stage_unknown_orf(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/PeptideVariantGraph.py", line 1028, in call_and_stage_unknown_orf
    traversal.pool.add_miscleaved_sequences(
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/VariantPeptideDict.py", line 336, in add_miscleaved_sequences
    for seq, metadata in seqs:
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/VariantPeptideDict.py", line 213, in join_miscleaved_peptides
    is_valid = self.is_valid_seq(seq, blacklist)
  File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/VariantPeptideDict.py", line 158, in is_valid_seq
    SeqUtils.molecular_weight(seq, 'protein') >= min_mw
  File "/usr/local/lib/python3.8/site-packages/Bio/SeqUtils/__init__.py", line 379, in molecular_weight
    raise ValueError(
ValueError: 'X' is not a valid unambiguous letter for protein

I'm using mopepgen:0.9.2 as the first releast that introduced lysn

@lydiayliu lydiayliu added the priority: now Issue to be fixed immediately label Aug 25, 2022
@zhuchcn zhuchcn self-assigned this Aug 26, 2022
@zhuchcn
Copy link
Member

zhuchcn commented Aug 26, 2022

The DNA sequence of the gene has a 'N' in the genome so it is translated to 'X' by biopython.. Should we just skip any peptides that have a 'X'?

@lydiayliu
Copy link
Collaborator Author

Yeah that's do that, it's useless for the search algos anyways. I retried this with version 0.6.0 and didn't have this error. What did you do since then that introduced it between 0.6.0 and 0.9.2?

@zhuchcn
Copy link
Member

zhuchcn commented Aug 26, 2022

Nothing that should change this was made since 0.6.0 that I can recall so I have no idea..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: now Issue to be fixed immediately
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants