Skip to content

get_path_count() returns 0 when run on a GBZ, breaking vg paths #4792

@faithokamoto

Description

@faithokamoto

@adamnovak here is the bug I mentioned in vg meeting today. It is in v1.71.0.

On December 27 I merged in #4780 which added a check to vg paths. If the input graph has no paths, the program now errors and refuses to continue. My logic here was that, as the command fundamentally operates on paths, a complete lack of paths will make it useless.

if (graph->get_path_count() == 0) {
logger.error() << "graph does not contain any paths" << std::endl;
}

I tested this on a few kinds of graphs and it looked fine. Notably, it passed CI tests. Today, I attempted to run it with a GBZ input. The graph definitely has paths. To my surprise, the command errored. To my bigger surprise, commenting out the error made it work fine. Which means that there are paths, just they aren't being detected by the get_path_count() function.

vg/src/vg.cpp

Lines 441 to 443 in 010e9ed

size_t VG::get_path_count() const {
return paths._paths.size();
}

GBZ creation:

GRAPH=/private/groups/patenlab/fokamoto/centrolign/graph/unsampled/chr12

# PG format needs nodes with length <= 1024bp for distance indexing
vg convert --gfa-in $GRAPH.gfa | vg mod --chop 1024 - > $GRAPH.pg
# Convert to GBZ format by smooshing GBWT and PG together
vg gbwt --index-paths -x $GRAPH.pg -o $GRAPH.gbwt
vg gbwt --gbz-format -x $GRAPH.pg $GRAPH.gbwt -g $GRAPH.gbz

vg paths --list -x $GRAPH.gfa | wc -l # 373
vg paths --list -x $GRAPH.pg | wc -l # 373
vg paths --list -g $GRAPH.gbwt | wc -l # 373
vg paths --list -x $GRAPH.gbz | wc -l # error

Again, though, if I comment out the check on line 535 then vg paths works fine and the 373 paths are found.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions