Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that vg rna produces unique HST names #4156

Merged
merged 1 commit into from
Nov 12, 2023

Conversation

jeizenga
Copy link
Contributor

Changelog Entry

To be copied to the draft changelog by merger:

  • vg rna no longer projects transcripts twice onto a reference given by RS tag in a GFA
  • vg rna assigns unique names to twice-projected transcripts on cyclic haplotypes

Description

There were a few overlapping bugs having to do with vg rna's projection algorithm when applied with new haplotype naming conventions. First, we were projecting reference transcripts back onto the reference whenever the reference was given as a W line with an RS tag. Second, when a transcript projected multiple times to the same haplotype (which can happen in cyclic, copy number variant genes) they were given the same identifier, which violated rpvg's assumptions downstream, leading to a crash.

@jeizenga
Copy link
Contributor Author

Also, the development highlighted that some of the default VCF to GBWT behavior can be problematic for vg rna. Currently, haplotypes around unphased variants are extended into the region between variants from both sides. This leads to two haplotype blocks from the same haplotype covering the region, which can confuse vg rna into projecting onto the haplotype twice. This will only really be a problem in VCFs of quite low variant density, because the region between variants needs to cover a full transcript, so this probably won't show up in most use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants