Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vg add crashing #3054

Closed
RenzoTale88 opened this issue Oct 20, 2020 · 2 comments · Fixed by #3068
Closed

Vg add crashing #3054

RenzoTale88 opened this issue Oct 20, 2020 · 2 comments · Fixed by #3068

Comments

@RenzoTale88
Copy link

Hello,
I'm trying to add variants to a graph generated through hal2vg and describe in issue #2828. I've generated a filtered vcf file from whole genome resequencing, and trying to implement it through the following command:

singularity exec --bind $PWD:/mnt vg.1.27.0.sif vg add -v VCF/Converted.filtered.lowMiss.NRAC1.vcf.gz -p -t 4 mygraph.pg > mygraph.added.pg

The run starts fine, but crashes midway with the following error:

Variant 1636205: 4 haplotypes at genome1.seq1:16079058: 261 bp vs. 764 bp haplotypes vs. graphs average
Variant 1636206: 3 haplotypes at genome1.seq1:16079084: 228 bp vs. 709 bp haplotypes vs. graphs average
Variant 1636207: 7 haplotypes at genome1.seq1:16079174: 212 bp vs. 602 bp haplotypes vs. graphs average
vg: src/path_index.cpp:77: vg::PathIndex::PathIndex(const std::__cxx11::list<vg::mapping_t>&, vg::VG&): Assertion `mapping.rank > last_rank || (mapping.rank == 0 && last_rank == 0)' failed.

Thank you for your help
Best
Andrea

@adamnovak
Copy link
Member

The problem is that vg add has never contemplated working on graphs where paths with variants overlap each other. It doesn't keep its internal path mapping rank data up to date as it edits the graph, because that would require looping over the whole downstream half of each path after every edit. But it's also lazy in using that data to build the PathIndex objects it needs, and it only builds them when it encounters variants on those paths. If it finds a variant on a path that has already been modified because of a variant on another overlapping path, it will try and use this not-up-to-date rank information and crash.

I can replicate this with a small test case:

{
    "node": [
        {"id": 1, "sequence": "CTTAAAATGATCGGGACTTTTCAAATCTTATTT"}
    ],
    "edge": [
    ],
    "path": [
        {"name": "ref", "mapping": [
            {"rank": 1, "edit": [
                {"from_length": 33, "to_length": 33}
            ], "position": {"node_id": 1, "offset": 0, "is_reverse": true}}
        ]},
        {"name": "ref2", "mapping": [
            {"rank": 1, "edit": [
                {"from_length": 33, "to_length": 33}
            ], "position": {"node_id": 1, "offset": 0}}
        ]},
        {"name": "ref3", "mapping": [
            {"rank": 1, "edit": [
                {"from_length": 33, "to_length": 33}
            ], "position": {"node_id": 1, "offset": 0, "is_reverse": true}}
        ]}
    ]
}
##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE1	SAMPLE2	SAMPLE3	SAMPLE4
ref	18	.	TC	T	100	PASS	.	GT	1/0	0/0	0|0	././1
ref	21	.	CGA	GAC	100	PASS	.	GT	0/1	0/0	./1	./1/.
ref	23	.	A	AC	100	PASS	.	GT	0/0	1/0	.	./0
ref3	18	.	TC	T	100	PASS	.	GT	1/0	0/0	0|0	././1
ref3	21	.	CGA	GAC	100	PASS	.	GT	0/1	0/0	./1	./1/.
ref3	23	.	A	AC	100	PASS	.	GT	0/0	1/0	.	./0

@adamnovak
Copy link
Member

OK, I've opened a PR that should solve this problem, by making all the necessary position indexes up front, before anything has been modified.

adamnovak added a commit that referenced this issue Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants