Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VG annotate can't add BED records across the junctions of circular genomes #1775

Closed
JervenBolleman opened this issue Jul 9, 2018 · 8 comments · Fixed by #1781
Closed

VG annotate can't add BED records across the junctions of circular genomes #1775

JervenBolleman opened this issue Jul 9, 2018 · 8 comments · Fixed by #1781
Assignees

Comments

@JervenBolleman
Copy link
Contributor

JervenBolleman commented Jul 9, 2018

Please describe:

  1. What you were trying to do
    vg annotate a 126 pangenome e. coli graph using bed files from EnsemblGeneomes.
  2. What you wanted to happen
    Have an humongous vg gam file
  3. What actually happened
vg: src/alignment.cpp:1450: void vg::parse_bed_regions(std::istream&, xg::XG*, std::vector<vg::Alignment>*): Assertion `sbuf < ebuf' failed.
ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace (most recent call last):
#9    Object "/scratch/uuw_sparql/jerventest/vg", at 0x4cc079, in _start
#8    Object "/scratch/uuw_sparql/jerventest/vg", at 0x16ab208, in __libc_start_main
#7    Object "/scratch/uuw_sparql/jerventest/vg", at 0x406510, in main
#6    Object "/scratch/uuw_sparql/jerventest/vg", at 0x8371c7, in vg::subcommand::Subcommand::operator()(int, char**) const
#5    Object "/scratch/uuw_sparql/jerventest/vg", at 0x7b4355, in main_annotate(int, char**)
#4    Object "/scratch/uuw_sparql/jerventest/vg", at 0xb427a9, in vg::parse_bed_regions(std::istream&, xg::XG*, std::vector<vg::Alignment, std::allocator<vg::Alignment> >*)
#3    Object "/scratch/uuw_sparql/jerventest/vg", at 0x16af121, in __assert_fail
#2    Object "/scratch/uuw_sparql/jerventest/vg", at 0x16af0ab, in __assert_fail_base
#1    Object "/scratch/uuw_sparql/jerventest/vg", at 0x16bba50, in abort
#0    Object "/scratch/uuw_sparql/jerventest/vg", at 0xdf9c07, in raise
  1. What data and command line to use to make the problem recur, if applicable
    These are the xz compressed vg file and bed file.
    Please reindex the vg file for xg locally.
    vg file
    bed file
vg annotate -p -x ecoli.xg -b ecoli-all.bed > ecoli.all.gam

vg version is static compiled vg 1.8.0

@ekg
Copy link
Member

ekg commented Jul 9, 2018

You should try:

vg annotate -p -x ecoli.xg -b ecoli-all.bed > ecoli.all.gam

Note the change (.xg instead of .xz).

@ekg ekg closed this as completed Jul 9, 2018
@JervenBolleman
Copy link
Contributor Author

@ekg that xz was a typo in the command in the issue, not when running it. The command fails with the correct xg file.

@JervenBolleman JervenBolleman reopened this Jul 9, 2018
@ekg
Copy link
Member

ekg commented Jul 9, 2018 via email

@JervenBolleman
Copy link
Contributor Author

@ekg the links to the dataset (vg file and bed file) are in the first comment of mine.

You will just need to build your own xg index but that is quick.
The files are xz compressed so you will need to unxz.

@6br
Copy link
Collaborator

6br commented Jul 10, 2018

I found the cause of this error.

E-coli is circler genome. If BED file includes the annotation that it crosses the start-and-end point of e-coli genomes, the position of the start point of that annotation is greater than that of the end point.

Escherichia_coli_gca_001420955.ASM142095v1.dna.chromosome.Chromosome:Chromosome	5154494	47	MJ49_00005	1000	-
Escherichia_coli_gca_001420955.ASM142095v1.dna.chromosome.Chromosome:Chromosome	5154494	47	MJ49_00005	1000	-
Escherichia_coli_gca_001612475.ASM161247v1.dna.chromosome.Chromosome:Chromosome	5560283	592	ARC77_00005	1000	+
Escherichia_coli_gca_001612475.ASM161247v1.dna.chromosome.Chromosome:Chromosome	5560283	592	ARC77_00005	1000	+

Since I am not sure target_alignment function, which is called in the vg annotate subcommand, can accept the circular range of annotations, currently it is refused using assertion start_point < end_point.
The assertion sbuf < ebuf is removable. I removed the assertion on the pull request(#1779), but it does not solve the main problem to handle circular range of annotations.

@ekg
Copy link
Member

ekg commented Jul 10, 2018

It looks like we'll have to teach target_alignment to handle this case.

@6br
Copy link
Collaborator

6br commented Jul 10, 2018

When a line which is start_point > end_point is included in BED file, I think we cannot distinguish the 2 patterns following.

  1. mistakenly or intentionally start_point is larger than end_point
  2. the annotations that wrap around the origin of a circular path

Even if the genome is circular, should we handle all start_point > end_point line as the annotations that wrap around the origin of a circular path?

@adamnovak
Copy link
Member

We should know if the path is circular, and accept start_point > end_point annotations only in that case.

@adamnovak adamnovak changed the title VG annotate failed with a stacktrace VG annotate can't add BED records across the junctions of circular genomes Jul 10, 2018
@adamnovak adamnovak self-assigned this Jul 10, 2018
@ekg ekg closed this as completed in #1781 Jul 12, 2018
@ghost ghost removed the in progress label Jul 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants