Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vg giraffe: unable to retrieve stacktrace.txt file because no access to /tmp #3882

Closed
marinak-ebi opened this issue Mar 12, 2023 · 4 comments

Comments

@marinak-ebi
Copy link

1. What were you trying to do?
Running vg giraffe using the following command:

export TMPDIR=/nfs/research/[REDACTED]/graph_alingment/test_mapping/temp && /homes/[REDACTED]/vg giraffe --threads 64 -Z ../index/index.chrALL.ref+pan.p95.giraffe.gbz -d ../index/index.chrALL.ref+pan.p95.dist -m ../index/index.chrALL.ref+pan.p95.min -p -f DBA_1J_2x_CGATGT_L006_R1_040.fq -f DBA_1J_2x_CGATGT_L006_R2_040.fq > mapped.gam

I was running it on a LSF cluster. The worker node had 240 GB of RAM and 64 cores.

2. What did you want to happen?
For the command to complete successfully.

3. What actually happened?
The command crashed with the following error message:

Preparing Indexes
Loading Minimizer Index
Loading GBZ
Loading Distance Index v2
Initializing MinimizerMapper
Loading and initialization: 315.762 seconds
Mapping reads to "-" (GAM)
--max-multimaps 1
--hit-cap 10
--hard-hit-cap 500
--score-fraction 0.9
--max-min 500
--num-bp-per-min 1000
--distance-limit 200
--max-extensions 800
--max-alignments 8
--cluster-score 50
--pad-cluster-score 20
--cluster-coverage 0.3
--extension-score 1
--extension-set 20
--rescue-attempts 15
--max-fragment-length 2000
--paired-distance-limit 2
--rescue-subgraph-size 4
--rescue-seed-limit 100
--chaining-cluster-distance 80
--max-lookback-bases 80
--min-lookback-items 1
--max-chain-connection 80
--max-tail-length 100
--rescue-algorithm dozeu
ERROR: Signal 11 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_CDxdb6/stacktrace.txt
Please include the stack trace file in your bug report!

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:
Here is the main problem: because I run on LSF worker node, and /tmp is not synchronised over NSF, I am unable to access to the stacktrace file. I tried overriding the path by setting $TMPDIR, but the command ignores that and still writes to /tmp.

I think the problem is happening in this part of the code:

char temp[] = "/tmp/vg_crash_XXXXXX";

5. What data and command can the vg dev team use to make the problem happen?
N/A. The input files weigh about 30 GB, I cannot share them publicly, but can send to the development team directly if needed.

6. What does running vg version say?

vg version v1.45.0 "Alpicella"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by anovak@octagon
@adamnovak
Copy link
Member

The workaround for getting the traceback in situations like this is to set the VG_FULL_TRACEBACK environment variable to 1.

We probably should make that the default, and also fix the traceback dump system to respect TMPDIR, maybe by pre-checking it at startup.

@marinak-ebi
Copy link
Author

marinak-ebi commented Mar 29, 2023

@adamnovak Thank you for your suggestions!
I've now run vg with this flag, and here is the stacktrace I got:

Crash report for vg v1.45.0 "Alpicella"
Stack trace (most recent call last):
#16   Object "/homes/marinak/vg", at 0x5e609d, in _start
#15   Object "/homes/marinak/vg", at 0x1e83eef, in __libc_start_main
#14   Object "/homes/marinak/vg", at 0x5b6a6e, in main
#13   Object "/homes/marinak/vg", at 0xcef6cb, in vg::subcommand::Subcommand::operator()(int, char**) const
#12   Object "/homes/marinak/vg", at 0xd3c86b, in main_giraffe(int, char**)
#11   Object "/homes/marinak/vg", at 0xd5290f, in std::_Function_handler<void (std::function<void ()> const&), vg::subcommand::TickChainLink::get_iterator()::{lambda(std::function<void ()> const&)#1}>::_M_invoke(std::_Any_data const&, std::function<void ()> const&)
#10   Object "/homes/marinak/vg", at 0xd40ba0, in main_giraffe(int, char**)::{lambda()#1}::operator()() const
#9    Object "/homes/marinak/vg", at 0x1039aa3, in vg::fastq_paired_two_files_for_each_parallel_after_wait(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void (vg::Alignment&, vg::Alignment&)>, std::function<bool ()>, unsigned long)
#8    Object "/homes/marinak/vg", at 0x1e67875, in GOMP_parallel
#7    Object "/homes/marinak/vg", at 0x10412e5, in unsigned long vg::io::paired_for_each_parallel_after_wait<vg::Alignment>(std::function<bool (vg::Alignment&, vg::Alignment&)>, std::function<void (vg::Alignment&, vg::Alignment&)>, std::function<bool ()>, unsigned long) [clone ._omp_fn.0]
#6    Object "/homes/marinak/vg", at 0xd371e8, in std::_Function_handler<void (vg::Alignment&, vg::Alignment&), main_giraffe(int, char**)::{lambda()#1}::operator()() const::{lambda(vg::Alignment&, vg::Alignment&)#6}>::_M_invoke(std::_Any_data const&, vg::Alignment&, vg::Alignment&)
#5    Object "/homes/marinak/vg", at 0xf9df64, in vg::MinimizerMapper::map_paired(vg::Alignment&, vg::Alignment&, std::vector<std::pair<vg::Alignment, vg::Alignment>, std::allocator<std::pair<vg::Alignment, vg::Alignment> > >&)
#4    Object "/homes/marinak/vg", at 0xf98f15, in vg::MinimizerMapper::map_paired(vg::Alignment&, vg::Alignment&)
#3    Object "/homes/marinak/vg", at 0xf8cafa, in void vg::MinimizerMapper::process_until_threshold_c<double>(unsigned long, std::function<double (unsigned long)> const&, std::function<bool (unsigned long, unsigned long)> const&, double, unsigned long, unsigned long, vg::LazyRNG&, std::function<bool (unsigned long)> const&, std::function<void (unsigned long)> const&, std::function<void (unsigned long)> const&) const [clone .constprop.0]
#2    Object "/homes/marinak/vg", at 0xf86ada, in std::_Function_handler<bool (unsigned long), vg::MinimizerMapper::map_paired(vg::Alignment&, vg::Alignment&)::{lambda(unsigned long)#20}>::_M_invoke(std::_Any_data const&, unsigned long&&)
#1    Object "/homes/marinak/vg", at 0x1b43861, in vg::Alignment::Alignment(vg::Alignment const&)
#0    Object "/homes/marinak/vg", at 0x1b4eb81, in void google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<vg::Path>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase const&)

What do you think may be causing it?

@adamnovak
Copy link
Member

It looks like vg crashed with a signal 11 (segfault) while copying an Alignment object, in a callback to the one process_until_threshold_c call in the paired-end alignment code, where the callback takes a number and returns a bool.

I think, because the disassembly says we're near calling the funnel pass function with max_multimaps, that a call to one of the other process until threshold functions is optimized away and we are actually in here:

vg/src/minimizer_mapper.cpp

Lines 2296 to 2362 in 915d3bc

// This alignment makes it
// Called in score order
const std::array<read_alignment_index_t, 2>& index_pair = paired_alignments[alignment_num];
// Remember the score at its rank
scores.emplace_back(paired_scores[alignment_num]);
distances.emplace_back(fragment_distances[alignment_num]);
types.emplace_back(pair_types[alignment_num]);
better_cluster_count_by_mappings.emplace_back(better_cluster_count_by_pairs[alignment_num]);
// Remember the output alignment
for (auto r : {0, 1}) {
mappings[r].emplace_back(index_pair[r].lookup_for_read_in(r, alignments));
}
if (mappings[0].size() == 1 && found_pair) {
//If this is the best pair of alignments that we're going to return and we didn't attempt rescue,
//get the group scores for mapq
//Get the scores of this pair
for (auto r : {0, 1}) {
scores_group[r].push_back(paired_scores[alignment_num]);
}
//The indices (into paired_alignments) of pairs with the same first/second read as this
std::array<vector<size_t>*, 2> alignment_group;
for (auto r : {0, 1}) {
alignment_group[r] = &index_pair[r].lookup_for_read_in(r, alignment_groups);
}
for (auto r : {0, 1}) {
for (size_t other_alignment_num : *alignment_group[r]) {
if (other_alignment_num != alignment_num) {
scores_group[r].push_back(paired_scores[other_alignment_num]);
}
}
}
}
// Flip second alignment back to input orientation
reverse_complement_alignment_in_place(&mappings[1].back(), [&](vg::id_t node_id) {
return gbwt_graph.get_length(gbwt_graph.get_handle(node_id));
});
if (mappings[0].size() > 1) {
// Mark pair as secondary alignments
for (auto r : {0, 1}) {
mappings[r].back().set_is_secondary(true);
}
}
#ifdef print_minimizer_table
mapping_was_rescued.emplace_back(alignment_was_rescued[alignment_num]);
pair_indices.push_back(index_pair);
#endif
if (track_provenance) {
// Tell the funnel
for (auto r : {0, 1}) {
funnels[r].pass("max-multimaps", alignment_num);
funnels[r].project(alignment_num);
funnels[r].score(funnels[r].latest(), scores.back());
}
}
return true;

I think we are probably at this line, where we copy the alignments:

mappings[r].emplace_back(index_pair[r].lookup_for_read_in(r, alignments));

This does some fancy indexing, and it's possible there's a mistake in it. I think we can build with debug_validate_index_references defined and it will check these references and start throwing exceptions, although it might not log quite enough information to work out exactly what is going wrong.

@marinak-ebi
Copy link
Author

marinak-ebi commented Mar 31, 2023

Thank you very much for your detailed explanations!
Do I understand correctly that, as a user, I can't do anything further to try to run it without crashing?

UPD. Updated version 1.47.0 is working!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants