Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

callVariant index out of range #454

Closed
lydiayliu opened this issue May 13, 2022 · 5 comments · Fixed by #455
Closed

callVariant index out of range #454

lydiayliu opened this issue May 13, 2022 · 5 comments · Fixed by #455
Assignees
Labels
effort: 8 Needs 8 points of effect to fix priority: now Issue to be fixed immediately

Comments

@lydiayliu
Copy link
Collaborator

    [ 2022-05-12 22:47:18 ] moPepGen callVariant started
    [ 2022-05-12 22:48:55 ] Reference indices loaded.
    [ 2022-05-12 22:48:55 ] Using GVF file: ACH-000800_Fusion_STARFusion.gvf
    [ 2022-05-12 22:48:55 ] Using GVF file: ACH-000800_Mutation_VEP.gvf
    [ 2022-05-12 22:48:56 ] Variants sorted
    [ 2022-05-12 22:51:09 ] 1000 transcripts processed.
    [ 2022-05-12 22:58:27 ] 2000 transcripts processed.
    [ 2022-05-12 23:01:20 ] 3000 transcripts processed.
    [ 2022-05-12 23:01:45 ] Exception raised from fusion FUSION-ENST00000589441.5:9405-ENST00000624697.4:7056, donor:ENST00000589441.5, accepter: ENST00000624697.4
    An error has occured during the function execution
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/ppft/__main__.py", line 111, in run
        __result = __f(*__args)
      File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 208, in wrapper
        return call_variant_peptides_wrapper(*dispatch)
      File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 174, in call_variant_peptides_wrapper
        _peptides = call_peptide_fusion(
      File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 385, in call_peptide_fusion
        dgraph.create_variant_graph(
      File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 906, in create_variant_graph
        variant.to_end_inclusion(self.seq)
      File "/usr/local/lib/python3.8/site-packages/moPepGen/seqvar/VariantRecord.py", line 411, in to_end_inclusion
        ref = self.ref[1:] + str(seq.seq[self.location.end])
      File "/usr/local/lib/python3.8/site-packages/Bio/Seq.py", line 430, in __getitem__
        return chr(self._data[index])
    IndexError: index out of range

Ran as part of call-noncanonical peptides

Caused by:
  Process `call_NonCanonicalPeptide (2)` terminated with an error exit status (1)

Command executed:

  set -euo pipefail
  
  NXF_WORK=$(pwd)/work \
  nextflow run /hot/users/yiyangliu/project-MissingPeptides-Method/pipelines/pipeline-meta-call-NonCanonicalPeptide/modules/call_NonCanonicalPeptide/../../external/pipeline-call-NonCanonicalPeptide/main.nf \
      --sample_name ACH-000800 \
      --input_csv input_3.csv \
      --filterFasta null \
      --output_dir $(pwd)/ACH-000800 \
      --entrypoint parser \
      --call_variant_ncpus 16 \
      --call_variant_memory_GB 32 GB \
      --variant_fasta NO_VARIANT_FASTA \
      -params-file config.json \
      -c /hot/users/yiyangliu/project-MissingPeptides-Method/pipelines/pipeline-meta-call-NonCanonicalPeptide/modules/call_NonCanonicalPeptide/template.config

I actually don't know how to reproduce this cuz the GVFs were not published XD

@lydiayliu lydiayliu added the priority: now Issue to be fixed immediately label May 13, 2022
@zhuchcn zhuchcn self-assigned this May 13, 2022
@zhuchcn zhuchcn added the effort: 5 Needs 5 points of effect to fix label May 13, 2022
@zhuchcn
Copy link
Member

zhuchcn commented May 13, 2022

This is a new edge case. The donor transcript ENST00000589441.5, is mRNA_start_NF, with the known CDS starts at position 1 (2nd nucleotide). The fusion donor breakpoint is 3, which is right before the end of the first three nucleotides of CDS, and then the function to_end_inclusion is called. What to_end_inclusion does is to change the variant from start inclusion to end inclusion for "insertion" like variants. So consider the variant 3-A-ACCCC, the end-inclusion version of it is 4-T-CCCCT, with the transcript sequence being CGATCCT... The to_end_inclusion does not work for fusion at all, because for fusion, the donor transcript is truncated at the breakpoint, so there is no next nucleotide. My personal thought is, those mRNA_start_NF transcripts are transcripts with imperfect annotation, so the confidence is lower. So I'm OK with just skipping those fusion events, because they are hard to deal with.

But, while figuring this out, I found another issue that would have a bigger impact. So when updating the transcriptional variant graph (TVG), if the current node has only one outgoing node, they are simply merged, and the subgraph ID of the merged node is the outgoing node. So then, if there is a fusion, and there is no other variants before it, then the first node of the TVG will have an only outgoing node, being the fusion accepter node. And then after merging, the first node from root will become a subgraph. Then in this case, the start codon could not be found correctly, because start codon is only searched on the nodes of the main graph (not subgraph). So the short story is, no variant peptides are called from coding transcripts with the first variants being a fusion. Fixing this is easy, we just have to, very very unfortunately and sadly, rerun everything 😭

@zhuchcn zhuchcn added effort: 8 Needs 8 points of effect to fix and removed effort: 5 Needs 5 points of effect to fix labels May 13, 2022
@lydiayliu
Copy link
Collaborator Author

lydiayliu commented May 13, 2022

Ha so to_end_inclusion should have never been called on a fusion. I think I'm ok with skipping these mRNA_start_NF, don't we currently already skip things if there is something wrong with the first 3 nucleotides? Maybe just output a warning so we can see how many cases there are?

And then after merging, the first node from root will become a subgraph. Then in this case, the start codon could not be found correctly, because start codon is only searched on the nodes of the main graph (not subgraph).

Interesting, then which part of the graph is the main graph? Is there even a main graph? XD

So the short story is, no variant peptides are called from coding transcripts with the first variants being a fusion. Fixing this is easy, we just have to, very very unfortunately and sadly, rerun everything 😭

Lol no worries! maxRetries = 15 will do the magic!

I just thought of something, since for a Fusion of 2 noncoding transcripts, it is persumable that many of the peptides overlap with just a 3 frame translation of the noncoding transcripts themselves. When we do splitFasta with the fusion database and the noncoding database, and put --order_source Fusion,Noncoding, where do those peptides go?

@zhuchcn
Copy link
Member

zhuchcn commented May 13, 2022

Maybe just output a warning so we can see how many cases there are?

That's a good idea!

Interesting, then which part of the graph is the main graph? Is there even a main graph? XD

Yeah then there isn't a "main" branch in the graph any more.

I just thought of something, since for a Fusion of 2 noncoding transcripts, it is persumable that many of the peptides overlap with just a 3 frame translation of the noncoding transcripts themselves.

Peptides called using callVariant should all include at least 1 variant. Oh are you saying that the variant peptides could overlap with the 3 frame translated peptides?

@lydiayliu
Copy link
Collaborator Author

Peptides called using callVariant should all include at least 1 variant. Oh are you saying that the variant peptides could overlap with the 3 frame translated peptides?

Actually if the only the node with the fusion breakpoint is the "variant" then I think we are good! If the whole thing is considered a variant (the whole fusion transcript top to bottom) then it might overlap with the 3 frame translated peptides?

@zhuchcn
Copy link
Member

zhuchcn commented May 13, 2022

That is how it is handled! Only calling peptides with at least one actual variant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: 8 Needs 8 points of effect to fix priority: now Issue to be fixed immediately
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants