You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I will be adding an ext/panic extension soon to make it easier to reproduce this issue. I will edit this issue when I have done so.
For now, this issue can be trivially reproduced by adding a panic!("muahaha") to the main closure of the get extension. The first handful of invocations will panic and be caught, as intended, but then invariably a double panic will occur, which causes the runtime to abort.
This issue has been hard to debug, but here is what I do know:
Recall that panic handling has two phases
A call to panic!(), which calls the rust runtime handlers present in frames 0-5 of the backtrace, as well as the panic hook.
An unwinding phase, where the program traces back up through the call stack, running drop functions along the way. (And possibly other things?)
The second panic does not appear to be during the call to panic!(). This code panics during the formatting of the panic string, notice how std::panicking::rust_panic_with_hook appears twice in the backtrace. We only see one copy of this function in our backtrace.
Therefore, it seems that the panic is happening during the unwind phase. From the backtrace we can see that the second panic seems to be in the main generator closure of the extension. Here is an example of a panic during a drop method. As expected, there is only one set of runtime panic functions in the backtrace.
Below is a backtrace of the double panic.
Here is a more complete log of a full session, including a backtrace of one of the single panics that was successfully caught: fullPanicIssueLog.txt.
stack backtrace:
0: 0x7f3d01b67c6b - std::sys::unix::backtrace::tracing::imp::unwind_backtrace::h8458cd77216b6cb4
at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: 0x7f3d01b35b10 - std::sys_common::backtrace::print::hc884ca89c7ab7468
at libstd/sys_common/backtrace.rs:71
at libstd/sys_common/backtrace.rs:59
2: 0x7f3d01b5b3bd - std::panicking::default_hook::{{closure}}::h4a3e30c6d4d0cba4
at libstd/panicking.rs:206
3: 0x7f3d01b5b11b - std::panicking::default_hook::hea868ab86a1b7a87
at libstd/panicking.rs:222
4: 0x7f3d01b5b8cf - std::panicking::rust_panic_with_hook::h2568e23a59a493fa
at libstd/panicking.rs:400
5: 0x7f3d01b21065 - std::panicking::begin_panic::hce7e5a88f7ff4fa1
6: 0x7f3d01b20ee2 - get::init::{{closure}}::h620c422872f2a80f
7: 0x555e505f5e88 - std::panickingWARN:server: Detected misbehaving task 84558 on core 10.::
try::do_call::ha34c11298de1ecc2
8: 0x555e50709cae - __rust_maybe_catch_panic
at libpanic_unwind/lib.rs:102
9: 0x555e505e2d0d - <db::container::Container as db::task::Task>::run::h29a2f1bd716d50bd
10: 0x555e505f587f - db::sched::RoundRobin::poll::hd3a9151cfc6bbe36
11: 0xWARN:server: Detected misbehaving task 84559 on core 17.555e50653927
- e2d2::schedulerINFO:server: Successfully added scheduler(TID 84560) with rx,tx,sibling queues (0, 0, 1) to core 10.::
standalone_scheduler::StandaloneScheduler::execute_internal::h4d688f42b578547c
12: 0x555e5065371a - e2d2::scheduler::thread 'standalone_scheduler<unnamed>::' panicked at 'StandaloneSchedulerexplicit panic::', handle_requestsrc/lib.rs:::h393b287b1d55187346
:13
13: 0x555e50662b31thread ' - <unnamed>std' panicked at '::explicit panicsys_common', ::src/lib.rsbacktrace:::46__rust_begin_short_backtrace:::13h7a42e3ab2bac4c68
14: 0x555e5066111b - std::panicking::try::do_call::h8e2568bebf30af60
15: 0x555e50709cae - __rust_maybe_catch_panic
INFO:server: Successfully added scheduler(TID 84561) with rx,tx,sibling queues (7, 7, 0) to core 17.
at libpanic_unwind/lib.rs:102
16: thread ' <unnamed> ' panicked at '0xexplicit panic555e50659d0c', - src/lib.rs<:F46 :as13
alloc::boxed::FnBox<A>>::thread 'call_box<unnamed>::' panicked at 'h3ddc12d9236471d3explicit panic
', src/lib.rs17:: 46 : 13
0x555e506ff117 - std::sys_common::thread::start_thread::h441a470255b0983b
at /checkout/src/liballoc/boxed.rs:645
at libstd/sys_common/thread.rs:24
18: 0x555e506f24d8 - std::sys::unix::thread::Thread::new::thread_start::h8246db0ba3b8ab5d
at libstd/sys/unix/thread.rs:90
19: 0x7f3d1c2836b9 - start_thread
20: 0x7f3d1bda341c - clone
21: 0x0 - <unknown>
The text was updated successfully, but these errors were encountered:
The current solution is to check the response for the generator executing with the help of catch_unwind in the container and if the std::thread::panicking() is true then put the scheduler in an infinite loop.
if let Err(_) = res {
self.state = COMPLETED;
if thread::panicking() {
if let Some((req, res)) = self.tear() {
req.free_packet();
res.free_packet();
}
loop {}
}
}
The monitoring core will realize that this thread is taking a lot of time in current extension and will put this thread on the bad core and the process will never crash.
However, the limitation with this approach is that after the /proc/sys/kernel/threads-max number of thread Splinter won't be able to spawn any more threads; value for which is 382024 on our current machine.
I will be adding an
ext/panic
extension soon to make it easier to reproduce this issue. I will edit this issue when I have done so.For now, this issue can be trivially reproduced by adding a
panic!("muahaha")
to the main closure of the get extension. The first handful of invocations will panic and be caught, as intended, but then invariably a double panic will occur, which causes the runtime to abort.This issue has been hard to debug, but here is what I do know:
panic!()
, which calls the rust runtime handlers present in frames 0-5 of the backtrace, as well as the panic hook.panic!()
. This code panics during the formatting of the panic string, notice howstd::panicking::rust_panic_with_hook
appears twice in the backtrace. We only see one copy of this function in our backtrace.Below is a backtrace of the double panic.
Here is a more complete log of a full session, including a backtrace of one of the single panics that was successfully caught:
fullPanicIssueLog.txt.
The text was updated successfully, but these errors were encountered: