I'm trying to index a small graph, but the vg index command just keeps working at 100% cpu forever. Or, at least I assume it's forever, since construct generated the graph in a few seconds, but index has spent over an hour on it.
The command I'm using is:
$ vg index -k 11 -x nvc-filt.xg -g nvc-filt.gcsa nvc-filt.vg
This is with the v1.4.0-1347-g846bf0b static binary, but it's also happened with a non-static version run from docker. The .xg file is created instantly and then never changes size, and the .gcsa file never appears. I've also tried making a rocksdb index, but that also never completes.
I tried editing the original vcf to include only a few of the indels I'm interested in, and it successfully indexes that graph (nvc-curated.vg) in like a second:
$ grep -v '^##' nvc-curated.vcf | cut -f -7
#CHROM POS ID REF ALT QUAL FILTER
SC8-ch 8268 . TAGC T . .
SC8-ch 8271 . C CACCCCCTCT,CT,CCCCTCT . .
SC8-ch 8289 . TAGA TA,T,TGA . .
SC8-ch 8290 . AGAG A,AAG . .
Maybe my graph is unusually complex. It's short, with the non-linear sequence only occurring over 1kb. But there are over 1,000 nodes in that space. And some areas are really complicated:

Supporting files:
asm.fa: reference sequence
nvc-filt.vg: problem graph
nvc-filt.vcf: VCF used to create it
nvc-curated.vg: subset which indexes fine
nvc-curated.vcf: VCF used to create it
nvc-filt.xg: incomplete xg file created early in the run of index
I'm trying to index a small graph, but the
vg indexcommand just keeps working at 100% cpu forever. Or, at least I assume it's forever, sinceconstructgenerated the graph in a few seconds, butindexhas spent over an hour on it.The command I'm using is:
$ vg index -k 11 -x nvc-filt.xg -g nvc-filt.gcsa nvc-filt.vgThis is with the v1.4.0-1347-g846bf0b static binary, but it's also happened with a non-static version run from docker. The .xg file is created instantly and then never changes size, and the .gcsa file never appears. I've also tried making a rocksdb index, but that also never completes.
I tried editing the original vcf to include only a few of the indels I'm interested in, and it successfully indexes that graph (
nvc-curated.vg) in like a second:Maybe my graph is unusually complex. It's short, with the non-linear sequence only occurring over 1kb. But there are over 1,000 nodes in that space. And some areas are really complicated:

Supporting files:
asm.fa: reference sequence
nvc-filt.vg: problem graph
nvc-filt.vcf: VCF used to create it
nvc-curated.vg: subset which indexes fine
nvc-curated.vcf: VCF used to create it
nvc-filt.xg: incomplete xg file created early in the run of
index