Skip to content

Commit

Permalink
Install Abseil failure signal handler in distributor/proton daemons
Browse files Browse the repository at this point in the history
This will attempt to dump a stack trace for the offending thread
to stderr, which greatly improves visibility for everyone running
Vespa on systems with core dumps disabled.

Signal handler chaining is explicitly enabled to allow sanitizer
handlers to be called as expected.

Note that we install our own signal handlers _after_ the Abseil
handlers to avoid noisy stack dumping on `SIGTERM`. It is considered
a fatal signal by the failure handler, but the config sentinel
uses it as a friendly "please shutdown now, or else" nudge in the
common case.
  • Loading branch information
vekterli committed Apr 10, 2024
1 parent 29b9803 commit c70a40e
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 0 deletions.
1 change: 1 addition & 0 deletions searchcore/src/apps/proton/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,5 @@ vespa_add_executable(searchcore_proton_app
searchcore_grouping
searchcore_proton_metrics
storageserver_storageapp
absl::failure_signal_handler
)
15 changes: 15 additions & 0 deletions searchcore/src/apps/proton/proton.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <vespa/config/common/configcontext.h>
#include <vespa/fnet/transport.h>
#include <vespa/fastos/file.h>
#include <absl/debugging/failure_signal_handler.h>
#include <filesystem>
#include <iostream>
#include <thread>
Expand Down Expand Up @@ -53,6 +54,20 @@ class App
void
App::setupSignals()
{
absl::FailureSignalHandlerOptions opts;
// Sanitizers set up their own signal handler, so we must ensure that the failure signal
// handler calls this when it's done, or we won't get a proper report.
opts.call_previous_handler = true;
// Ideally we'd use an alternate stack to have well-defined reporting when a
// thread runs out of stack space (infinite recursion bug etc.), but for some
// reason this seems to negatively affect stack walking and give very incomplete
// traces. So until this is resolved, use the thread's own stack.
opts.use_alternate_stack = false;
absl::InstallFailureSignalHandler(opts);

// Install our own signal handlers _after_ the failure handler, as the sentinel uses
// SIGTERM as a "friendly poke for shutdown" signal and the Abseil failure handler
// always dumps stack when intercepting this signal (since it's considered fatal).
SIG::PIPE.ignore();
SIG::INT.hook();
SIG::TERM.hook();
Expand Down
1 change: 1 addition & 0 deletions storageserver/src/apps/storaged/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ vespa_add_executable(storageserver_storaged_app
DEPENDS
storageserver_storageapp
protobuf::libprotobuf
absl::failure_signal_handler
)

vespa_add_target_package_dependency(storageserver_storaged_app Protobuf)
Expand Down
8 changes: 8 additions & 0 deletions storageserver/src/apps/storaged/storage.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include <vespa/config/helper/configgetter.hpp>
#include <vespa/vespalib/util/signalhandler.h>
#include <google/protobuf/message_lite.h>
#include <absl/debugging/failure_signal_handler.h>
#include <iostream>
#include <csignal>
#include <cstdlib>
Expand Down Expand Up @@ -213,8 +214,15 @@ int StorageApp::main(int argc, char **argv)
} // storage

int main(int argc, char **argv) {
absl::FailureSignalHandlerOptions opts;
// See `searchcore/src/apps/proton/proton.cpp` for parameter and handler ordering rationale.
opts.call_previous_handler = true;
opts.use_alternate_stack = false;
absl::InstallFailureSignalHandler(opts);

vespalib::SignalHandler::PIPE.ignore();
vespalib::SignalHandler::enable_cross_thread_stack_tracing();

storage::StorageApp app;
storage::sigtramp = &app;
int retval = app.main(argc,argv);
Expand Down

0 comments on commit c70a40e

Please sign in to comment.