-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Construction crash introduced with Safe SVs #1737
Comments
I was getting crashes in this same VCF a while back due to SV code
(#1488). I fixed this by making sure the SV logic was not activated unless
-S was passed (#1492). I bring this up now as being possibly related as
my fix seems to have been commented out / reverted in master:
https://github.com/vgteam/vg/blob/master/src/constructor.cpp#L1432-L1434
…On Wed, Jun 13, 2018 at 3:48 PM, Adam Novak ***@***.***> wrote:
Since the safe SVs merge commit, if you use:
s3://cgl-pipeline-inputs/vg_cgl/pop-map/graphs/wg38/v2/
ALL.chrX_GRCh38.genotypes.20170504_minaf_0.1.vcf.gz
s3://cgl-pipeline-inputs/vg_cgl/pop-map/graphs/wg38/v2/
ALL.chrX_GRCh38.genotypes.20170504_minaf_0.1.vcf.gz.tbi
s3://cgl-pipeline-inputs/vg_cgl/pop-map/input/GRCh38.fa.gz .
And you do:
gunzip GRCh38.fa.gz
vg construct -p --reference GRCh38.fa --vcf ALL.chrX_GRCh38.genotypes.20170504_minaf_0.1.vcf.gz --region X --region-is-chrom --node-max 32 --alt-paths --threads 1 >/dev/null
There is a crash because we try to go beyond the end of the chunk when
creating reference nodes, at about 55.2% done.
(gdb) bt
#0 __cxxabiv1::__cxa_throw ***@***.***=0x2489d40, tinfo=0x1970270 <typeinfo for std::out_of_range@@GLIBCXX_3.4>,
dest=0x6655b0 ***@***.***>)
at /public/home/anovak/build/gcc-8.1.0/libstdc++-v3/libsupc++/eh_throw.cc:80
#1 0x00007fe216510891 in std::__throw_out_of_range_fmt (__fmt=<optimized out>) from /public/home/anovak/.local/lib64/libstdc++.so.6
#2 0x0000000000db28b9 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_check (
__s=0x143fd77 "basic_string::substr", __pos=<optimized out>, this=<optimized out>)
at /public/home/anovak/.local/include/c++/8.1.0/bits/char_traits.h:350
#3 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::substr (__n=<optimized out>,
__pos=<optimized out>, this=<optimized out>) at /public/home/anovak/.local/include/c++/8.1.0/bits/basic_string.h:2807
#4 vg::Constructor::construct_chunk(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<vcflib::Variant, std::allocator<vcflib::Variant> >, unsigned long) const::{lambda(unsigned long)#3}::operator()(unsigned long) const () at src/constructor.cpp:347
#5 0x0000000000db563b in vg::Constructor::construct_chunk(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<vcflib::Variant, std::allocator<vcflib::Variant> >, unsigned long) const () at src/constructor.cpp:946
#6 0x0000000000db852a in vg::Constructor::construct_graph(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, FastaReference&, vg::VcfBuffer&, std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::function<void (vg::Graph&)>) () at /public/home/anovak/.local/include/c++/8.1.0/new:169
#7 0x0000000000db901d in vg::Constructor::construct_graph(std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::vector<vcflib::VariantCallFile*, std::allocator<vcflib::VariantCallFile*> > const&, std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::function<void (vg::Graph&)>) () at src/constructor.cpp:1644
#8 0x0000000000b2fa83 in main_construct(int, char**) () at /public/home/anovak/.local/include/c++/8.1.0/new:169
#9 0x0000000000b00ae8 in std::function<int (int, char**)>::operator()(int, char**) const (__args#1=<optimized out>,
__args#0=<optimized out>, this=<optimized out>) at /public/home/anovak/.local/include/c++/8.1.0/bits/std_function.h:260
#10 vg::subcommand::Subcommand::operator() (this=<optimized out>, ***@***.***=15, ***@***.***=0x7fffa20cdde8)
at src/subcommand/subcommand.cpp:72
#11 0x000000000073eeb7 in main () at src/main.cpp:61
#12 0x00007fe215c93445 in __libc_start_main () from /usr/lib64/libc.so.6
#13 0x00000000007f4eee in _start () at src/main.cpp:70
I can find the chunk position and the reference size for the chunk, and I
see the reference cursor getting beyond the end of the chunk somehow.
(gdb) p reference_sequence.size()
$2 = 521981
(gdb) p reference_cursor
$3 = 1107218
(gdb) p chunk_offset
$4 = 86107742
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1737>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA2_7l9RBHWj8vEMBic6UdOifTYd70N7ks5t8Wx8gaJpZM4Um0ua>
.
|
Dang. Sorry y'all - I'll rewire this tomorrow. I'll make sure to get it right this time. |
Here's the only SV in the failing chunk. It's a multiallelic SV, which apparently we don't support yet?
I am adding code to skip these. We will see what happens. |
This SV has its END defined on GRCh37, which is probably what threw the cursor off. Sorry to steamroll your fix, Glenn. If I remember correctly I was trying to recover the original output, maybe to pass the tests. I clearly didn't get it quite right. EDIT: It seems it's on my local branch? Maybe it didn't get pushed up. vg (before any safe-svs PRs) sees this and says it's not an SV (since it passes the allATGC check), and SV-aware (but broken) vg sees this is an SV (since it has an SVTYPE), so if we have Since the current batch of VCFs seems to prefer putting things in ALT alleles, my plan is to make those first priority and default to testing if something is an SV only after the allATGCN check fails. Does this sound alright to folks? I'll work on reducing the complexity/fragility of the SV handling code. |
OK, I think the behavior we want is:
We might want some logic that will demote some SVs (like maybe this problematic one) to ordinary variants, if they have fully specified ref and alt strings? I'm not sure about that. I will implement this and make a PR shortly. |
@edawson I think we maybe do want to handle fully-specified (and maybe short?) SVs with the normal codepath. But the vcflib is_sv() check is used in several places, and changing them all to use the allACGT check might be fiddly. Maybe we just want to delete the SV marker from the variant record? |
I'm working on this - no need for another PR, but some review might be good. I'll have it by tonight. I think we should warn if there are SV tags and ref/alts that are mismatched. Otherwise, prefer ref/alt, then SV tags, then anything external (e.g. insertion sequences in FASTA). If we don't get |
Since the safe SVs merge commit, if you use:
s3://cgl-pipeline-inputs/vg_cgl/pop-map/graphs/wg38/v2/ALL.chrX_GRCh38.genotypes.20170504_minaf_0.1.vcf.gz
s3://cgl-pipeline-inputs/vg_cgl/pop-map/graphs/wg38/v2/ALL.chrX_GRCh38.genotypes.20170504_minaf_0.1.vcf.gz.tbi
s3://cgl-pipeline-inputs/vg_cgl/pop-map/input/GRCh38.fa.gz .
And you do:
There is a crash because we try to go beyond the end of the chunk when creating reference nodes, at about 55.2% done.
I can find the chunk position and the reference size for the chunk, and I see the reference cursor getting beyond the end of the chunk somehow.
The crash happens in b5b3b2f but not in 62b2729.
The text was updated successfully, but these errors were encountered: