rt: provide options to configure unhandled panic behavior #4516

carllerche · 2022-02-17T19:01:20Z

Currently, all panics on tasks are caught and exposed to the user via
Joinhandle. However, it is somewhat uncommon to use the JoinHandle.
Background tasks are spawned and may silently fail resulting in the rest of the
application to hang. Also, in tests, a background task that panics can result in
the test hanging indefinitely, making debugging annoying.

That said, the current behavior is the correct default. Even if it weren't,
changing it now would be too late. A task boundary is a logical boundary to
separate failure. When implementing a sever, it is not desirable to have an
uncommon bug in one request handler to take down the entire process.

So, because different scenarios merit different behaviors, a runtime
configuration option could provide the user with the ability to pick the
behavior best suited for their case.

There are a few ways panics could be handled:

Forward to the JoinHandle and ignore otherwise (what happens today).
Forward to the Joinhandle but if the JoinHandle drops (ignores the result)
then shutdown the runtime.
Always shutdown the runtime on panic.
Pass the panic to a user provided callback to pick which of the above
strategies to take.

So, to expose the different options to the user:

#[non_exhaustive]
// TODO: naming?
enum UnhandledPanic {
    Ignore,
    ShutdownRuntime,
    ShutdownRuntimeIfIgnored,
}

type PanicError = Box<dyn Any + Send + 'static>;

impl runtime::Builder {
    fn unhandled_panic_behavior(&mut self, UnhandledPanic) { ... }

    fn on_unhandled_panic(&mut self, f: Fn(PanicError) -> UnhandledPanic) { ... }
}

Runtime shutdown

What does it mean to "shutdown the runtime" on unhandled panic. First, the
current shutdown behavior is executed. All in-flight tasks are forcibly aborted
and runtime resources are disabled. The next question is how to expose the
unhandled panic.

If the user enables "shutdown runtime on unhandled panic" and a panic does get
through, it seems likely that this is a bug. The Runtime methods in question
are:

spawn
block_on

spawn could maintain the current behavior when called after a runtime has
shutdown: immediately drop the task and complete the JoinHandle with an error.
The block_on method does not return result. The only option I see is for it to
panic when the runtime has seen an unhandled panic.

To compensate, we could add methods on Runtime to query the runtime state,
e.g. Runtime::status() -> Running | Shutdown | UnhandledPanic | ...

Initial implementation

As an initial step to get the feature going. I suggest implementing an MVP
version of the feature as an unstable API and only for the current_thread
runtime. This would let us explore the space more and try things out. The
initial implementation could also start by only letting the user pick between
the current behavior and ShutdownRuntime. So:

#[non_exhaustive]
enum UnhandledPanic {
    Ignore,
    ShutdownRuntime,
}

type PanicError = Box<dyn Any + Send + 'static>;

impl runtime::Builder {
    fn unhandled_panic_behavior(&mut self, UnhandledPanic) { ... }
}

When the multi-threaded runtime is selected, these option would have no effect.
Implementing for the multi-threaded runtime would be required before stabilizing
the API but because the implementation is much harder, we should first gather data.

Open questions

How should unhandled panics be propagated? Should they be sent to block_on or the JoinHandle (ref: rt: provide options to configure unhandled panic behavior #4516).
How should LocalSet and JoinSet work. Should they track their own settings or inherit from the runtime? Should there be a LocalSet::builder()?

Known issues

Switching the "current' scheduler context then panicking (Add LocalSet::enter #4765 (comment)). In this case, "current" does not reference the runtime that should intercept the panic.

The text was updated successfully, but these errors were encountered:

Allows the user to configure the runtime's behavior when a spawned task panics. Currently, the panic is propagated to the JoinHandle and the runtime resumes. This patch lets the user set the runtime to shutdown on unhandled panic. So far, this is only implemented for the current-thread runtime. Refs: #4516

markus2330 · 2023-12-17T09:53:25Z

Thank you for working on the issue, it sounds great and I would like to use it.

For me, ShutdownRuntime with tokio unstable 1.35.0 does not work once I build the runtime with enable_time.

Shutdown on Panic works

E.g. with:

fn main() {
	tokio::runtime::Builder::new_multi_thread()
		.unhandled_panic(UnhandledPanic::ShutdownRuntime)
		.worker_threads(2)
		.enable_io() // <--- instead of enable_all()
		.build()
		.unwrap()
		.block_on(async {
			let _ = start().await;
		})
}

The process gets terminated on a panic.

Shutdown on Panic Fails

But once I use enable_all() (see comment in code above) the process does not terminate on panics anymore.

Unfortunately, I couldn't get start() to a small reproducible code (actual non-trivial I/O and/or tokio::time usage seems to be required to trigger the problem), the whole code can be found in https://github.com/ElektraInitiative/opensesame/pull/131/files#diff-42cb6807ad74b3e201c5a7ca98b911c5fa08380e942be6e4ac5807f8377f87fc

Should I create an issue? Or is it obvious that it currently won't work? (As the feature is not implemented fully.)

Darksonn · 2023-12-17T10:45:53Z

Feel free to create an issue.

markus2330 · 2023-12-17T16:31:33Z

Thx for the invitation 🚀, I was able to create a minimal reproducible example: #6222

Would be great to see it fixed, soon 💞

carllerche added C-proposal Category: a proposal and request for comments A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime M-task Module: tokio/task C-feature-request Category: A feature request. labels Feb 17, 2022

carllerche mentioned this issue Feb 18, 2022

rt: unhandled panic config for current thread rt #4518

Closed

carllerche mentioned this issue Jun 15, 2022

rt: unhandled panic config for current thread rt #4770

Merged

Darksonn mentioned this issue Jun 28, 2022

testing: Panic propagation for test code #3217

Closed

gftea mentioned this issue Jul 5, 2022

Add LocalSet::enter #4765

Merged

Darksonn mentioned this issue Aug 10, 2022

Detached tasks in tests #2699

Closed

sr-gi mentioned this issue Aug 24, 2022

Consider setting panic to abort in teos talaia-labs/rust-teos#57

Open

bouk mentioned this issue Nov 10, 2022

Stop runtime on task panic #2002

Open

pavel-kokolemin mentioned this issue Feb 14, 2023

Abort upon panic mintlayer/mintlayer-core#693

Merged

maminrayej mentioned this issue Mar 17, 2024

unhandled_panic(UnhandledPanic::ShutdownRuntime) not working when endless interval loop in tasks #6222

Closed

xortive mentioned this issue May 2, 2024

Allow setting unhandled_panic behavior as option on tokio::test #6527

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rt: provide options to configure unhandled panic behavior #4516

rt: provide options to configure unhandled panic behavior #4516

carllerche commented Feb 17, 2022 •

edited

markus2330 commented Dec 17, 2023 •

edited

Darksonn commented Dec 17, 2023

markus2330 commented Dec 17, 2023

rt: provide options to configure unhandled panic behavior #4516

rt: provide options to configure unhandled panic behavior #4516

Comments

carllerche commented Feb 17, 2022 • edited

Runtime shutdown

Initial implementation

Open questions

Known issues

markus2330 commented Dec 17, 2023 • edited

Shutdown on Panic works

Shutdown on Panic Fails

Darksonn commented Dec 17, 2023

markus2330 commented Dec 17, 2023

carllerche commented Feb 17, 2022 •

edited

markus2330 commented Dec 17, 2023 •

edited