Skip to content

vg index forever at 100% cpu #511

@NickSto

Description

@NickSto

I'm trying to index a small graph, but the vg index command just keeps working at 100% cpu forever. Or, at least I assume it's forever, since construct generated the graph in a few seconds, but index has spent over an hour on it.

The command I'm using is:
$ vg index -k 11 -x nvc-filt.xg -g nvc-filt.gcsa nvc-filt.vg
This is with the v1.4.0-1347-g846bf0b static binary, but it's also happened with a non-static version run from docker. The .xg file is created instantly and then never changes size, and the .gcsa file never appears. I've also tried making a rocksdb index, but that also never completes.

I tried editing the original vcf to include only a few of the indels I'm interested in, and it successfully indexes that graph (nvc-curated.vg) in like a second:

$ grep -v '^##' nvc-curated.vcf | cut -f -7
#CHROM  POS ID  REF ALT QUAL    FILTER
SC8-ch  8268    .   TAGC    T   .   .
SC8-ch  8271    .   C   CACCCCCTCT,CT,CCCCTCT   .   .
SC8-ch  8289    .   TAGA    TA,T,TGA    .   .
SC8-ch  8290    .   AGAG    A,AAG   .   .

Maybe my graph is unusually complex. It's short, with the non-linear sequence only occurring over 1kb. But there are over 1,000 nodes in that space. And some areas are really complicated:
nvc-filt.vg excerpt

Supporting files:
asm.fa: reference sequence
nvc-filt.vg: problem graph
nvc-filt.vcf: VCF used to create it
nvc-curated.vg: subset which indexes fine
nvc-curated.vcf: VCF used to create it
nvc-filt.xg: incomplete xg file created early in the run of index

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions