Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A panicking extension can sometimes cause a double panic and bring down the db #9

Closed
ethransom opened this issue Sep 20, 2018 · 1 comment

Comments

@ethransom
Copy link

I will be adding an ext/panic extension soon to make it easier to reproduce this issue. I will edit this issue when I have done so.

For now, this issue can be trivially reproduced by adding a panic!("muahaha") to the main closure of the get extension. The first handful of invocations will panic and be caught, as intended, but then invariably a double panic will occur, which causes the runtime to abort.

This issue has been hard to debug, but here is what I do know:

  • Recall that panic handling has two phases
    1. A call to panic!(), which calls the rust runtime handlers present in frames 0-5 of the backtrace, as well as the panic hook.
    2. An unwinding phase, where the program traces back up through the call stack, running drop functions along the way. (And possibly other things?)
  • The second panic does not appear to be during the call to panic!(). This code panics during the formatting of the panic string, notice how std::panicking::rust_panic_with_hook appears twice in the backtrace. We only see one copy of this function in our backtrace.
  • Therefore, it seems that the panic is happening during the unwind phase. From the backtrace we can see that the second panic seems to be in the main generator closure of the extension. Here is an example of a panic during a drop method. As expected, there is only one set of runtime panic functions in the backtrace.

Below is a backtrace of the double panic.

Here is a more complete log of a full session, including a backtrace of one of the single panics that was successfully caught:
fullPanicIssueLog.txt.

stack backtrace:
   0:     0x7f3d01b67c6b - std::sys::unix::backtrace::tracing::imp::unwind_backtrace::h8458cd77216b6cb4
			       at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1:     0x7f3d01b35b10 - std::sys_common::backtrace::print::hc884ca89c7ab7468
			       at libstd/sys_common/backtrace.rs:71
			       at libstd/sys_common/backtrace.rs:59
   2:     0x7f3d01b5b3bd - std::panicking::default_hook::{{closure}}::h4a3e30c6d4d0cba4
			       at libstd/panicking.rs:206
   3:     0x7f3d01b5b11b - std::panicking::default_hook::hea868ab86a1b7a87
			       at libstd/panicking.rs:222
   4:     0x7f3d01b5b8cf - std::panicking::rust_panic_with_hook::h2568e23a59a493fa
			       at libstd/panicking.rs:400
   5:     0x7f3d01b21065 - std::panicking::begin_panic::hce7e5a88f7ff4fa1
   6:     0x7f3d01b20ee2 - get::init::{{closure}}::h620c422872f2a80f
   7:     0x555e505f5e88 - std::panickingWARN:server: Detected misbehaving task 84558 on core 10.::
try::do_call::ha34c11298de1ecc2
   8:     0x555e50709cae - __rust_maybe_catch_panic
			       at libpanic_unwind/lib.rs:102
   9:     0x555e505e2d0d - <db::container::Container as db::task::Task>::run::h29a2f1bd716d50bd
  10:     0x555e505f587f - db::sched::RoundRobin::poll::hd3a9151cfc6bbe36
  11:     0xWARN:server: Detected misbehaving task 84559 on core 17.555e50653927
 - e2d2::schedulerINFO:server: Successfully added scheduler(TID 84560) with rx,tx,sibling queues (0, 0, 1) to core 10.::
standalone_scheduler::StandaloneScheduler::execute_internal::h4d688f42b578547c
  12:     0x555e5065371a - e2d2::scheduler::thread 'standalone_scheduler<unnamed>::' panicked at 'StandaloneSchedulerexplicit panic::', handle_requestsrc/lib.rs:::h393b287b1d55187346
:13
13:     0x555e50662b31thread ' - <unnamed>std' panicked at '::explicit panicsys_common', ::src/lib.rsbacktrace:::46__rust_begin_short_backtrace:::13h7a42e3ab2bac4c68

  14:     0x555e5066111b - std::panicking::try::do_call::h8e2568bebf30af60
  15:     0x555e50709cae - __rust_maybe_catch_panic
	       INFO:server: Successfully added scheduler(TID 84561) with rx,tx,sibling queues (7, 7, 0) to core 17.
	       at libpanic_unwind/lib.rs:102
  16:   thread ' <unnamed> ' panicked at '0xexplicit panic555e50659d0c',  - src/lib.rs<:F46 :as13
alloc::boxed::FnBox<A>>::thread 'call_box<unnamed>::' panicked at 'h3ddc12d9236471d3explicit panic
',   src/lib.rs17:: 46 : 13
 0x555e506ff117 - std::sys_common::thread::start_thread::h441a470255b0983b
			       at /checkout/src/liballoc/boxed.rs:645
			       at libstd/sys_common/thread.rs:24
  18:     0x555e506f24d8 - std::sys::unix::thread::Thread::new::thread_start::h8246db0ba3b8ab5d
			       at libstd/sys/unix/thread.rs:90
  19:     0x7f3d1c2836b9 - start_thread
  20:     0x7f3d1bda341c - clone
  21:                0x0 - <unknown>
@ankitbhrdwj
Copy link
Member

ankitbhrdwj commented Mar 15, 2019

The reason the process was crashing with multiple panics was the following code in Rust panic handler -

if panics > 1 {
        util::dumb_print(format_args!("thread panicked while panicking. \
                                       aborting.\n"));
        unsafe { intrinsics::abort() }
}

https://github.com/rust-lang/rust/blob/master/src/libstd/panicking.rs

The current solution is to check the response for the generator executing with the help of catch_unwind in the container and if the std::thread::panicking() is true then put the scheduler in an infinite loop.

if let Err(_) = res {
    self.state = COMPLETED;
    if thread::panicking() {
        if let Some((req, res)) = self.tear() {
            req.free_packet();
            res.free_packet();
        }
        loop {}
    }
}

The monitoring core will realize that this thread is taking a lot of time in current extension and will put this thread on the bad core and the process will never crash.

However, the limitation with this approach is that after the /proc/sys/kernel/threads-max number of thread Splinter won't be able to spawn any more threads; value for which is 382024 on our current machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants