Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt: provide options to configure unhandled panic behavior #4516

Open
3 tasks
carllerche opened this issue Feb 17, 2022 · 3 comments
Open
3 tasks

rt: provide options to configure unhandled panic behavior #4516

carllerche opened this issue Feb 17, 2022 · 3 comments
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. C-proposal Category: a proposal and request for comments M-runtime Module: tokio/runtime M-task Module: tokio/task

Comments

@carllerche
Copy link
Member

carllerche commented Feb 17, 2022

Currently, all panics on tasks are caught and exposed to the user via
Joinhandle. However, it is somewhat uncommon to use the JoinHandle.
Background tasks are spawned and may silently fail resulting in the rest of the
application to hang. Also, in tests, a background task that panics can result in
the test hanging indefinitely, making debugging annoying.

That said, the current behavior is the correct default. Even if it weren't,
changing it now would be too late. A task boundary is a logical boundary to
separate failure. When implementing a sever, it is not desirable to have an
uncommon bug in one request handler to take down the entire process.

So, because different scenarios merit different behaviors, a runtime
configuration option could provide the user with the ability to pick the
behavior best suited for their case.

There are a few ways panics could be handled:

  • Forward to the JoinHandle and ignore otherwise (what happens today).
  • Forward to the Joinhandle but if the JoinHandle drops (ignores the result)
    then shutdown the runtime.
  • Always shutdown the runtime on panic.
  • Pass the panic to a user provided callback to pick which of the above
    strategies to take.

So, to expose the different options to the user:

#[non_exhaustive]
// TODO: naming?
enum UnhandledPanic {
    Ignore,
    ShutdownRuntime,
    ShutdownRuntimeIfIgnored,
}

type PanicError = Box<dyn Any + Send + 'static>;

impl runtime::Builder {
    fn unhandled_panic_behavior(&mut self, UnhandledPanic) { ... }

    fn on_unhandled_panic(&mut self, f: Fn(PanicError) -> UnhandledPanic) { ... }
}

Runtime shutdown

What does it mean to "shutdown the runtime" on unhandled panic. First, the
current shutdown behavior is executed. All in-flight tasks are forcibly aborted
and runtime resources are disabled. The next question is how to expose the
unhandled panic.

If the user enables "shutdown runtime on unhandled panic" and a panic does get
through, it seems likely that this is a bug. The Runtime methods in question
are:

  • spawn
  • block_on

spawn could maintain the current behavior when called after a runtime has
shutdown: immediately drop the task and complete the JoinHandle with an error.
The block_on method does not return result. The only option I see is for it to
panic when the runtime has seen an unhandled panic.

To compensate, we could add methods on Runtime to query the runtime state,
e.g. Runtime::status() -> Running | Shutdown | UnhandledPanic | ...

Initial implementation

As an initial step to get the feature going. I suggest implementing an MVP
version of the feature as an unstable API and only for the current_thread
runtime. This would let us explore the space more and try things out. The
initial implementation could also start by only letting the user pick between
the current behavior and ShutdownRuntime. So:

#[non_exhaustive]
enum UnhandledPanic {
    Ignore,
    ShutdownRuntime,
}

type PanicError = Box<dyn Any + Send + 'static>;

impl runtime::Builder {
    fn unhandled_panic_behavior(&mut self, UnhandledPanic) { ... }
}

When the multi-threaded runtime is selected, these option would have no effect.
Implementing for the multi-threaded runtime would be required before stabilizing
the API but because the implementation is much harder, we should first gather data.

Open questions

Known issues

  • Switching the "current' scheduler context then panicking (Add LocalSet::enter #4765 (comment)). In this case, "current" does not reference the runtime that should intercept the panic.
@carllerche carllerche added C-proposal Category: a proposal and request for comments A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime M-task Module: tokio/task C-feature-request Category: A feature request. labels Feb 17, 2022
carllerche added a commit that referenced this issue Feb 18, 2022
Allows the user to configure the runtime's behavior when a spawned task
panics. Currently, the panic is propagated to the JoinHandle and the
runtime resumes. This patch lets the user set the runtime to shutdown on
unhandled panic.

So far, this is only implemented for the current-thread runtime.

Refs: #4516
carllerche added a commit that referenced this issue Jun 15, 2022
Allows the user to configure the runtime's behavior when a spawned task
panics. Currently, the panic is propagated to the JoinHandle and the
runtime resumes. This patch lets the user set the runtime to shutdown on
unhandled panic.

So far, this is only implemented for the current-thread runtime.

Refs: #4516
carllerche added a commit that referenced this issue Jun 16, 2022
Allows the user to configure the runtime's behavior when a spawned task
panics. Currently, the panic is propagated to the JoinHandle and the
runtime resumes. This patch lets the user set the runtime to shutdown on
unhandled panic.

So far, this is only implemented for the current-thread runtime.

Refs: #4516
@markus2330
Copy link

markus2330 commented Dec 17, 2023

Thank you for working on the issue, it sounds great and I would like to use it.

For me, ShutdownRuntime with tokio unstable 1.35.0 does not work once I build the runtime with enable_time.

Shutdown on Panic works

E.g. with:

fn main() {
	tokio::runtime::Builder::new_multi_thread()
		.unhandled_panic(UnhandledPanic::ShutdownRuntime)
		.worker_threads(2)
		.enable_io() // <--- instead of enable_all()
		.build()
		.unwrap()
		.block_on(async {
			let _ = start().await;
		})
}

The process gets terminated on a panic.

Shutdown on Panic Fails

But once I use enable_all() (see comment in code above) the process does not terminate on panics anymore.

Unfortunately, I couldn't get start() to a small reproducible code (actual non-trivial I/O and/or tokio::time usage seems to be required to trigger the problem), the whole code can be found in https://github.com/ElektraInitiative/opensesame/pull/131/files#diff-42cb6807ad74b3e201c5a7ca98b911c5fa08380e942be6e4ac5807f8377f87fc

Should I create an issue? Or is it obvious that it currently won't work? (As the feature is not implemented fully.)

@Darksonn
Copy link
Contributor

Feel free to create an issue.

@markus2330
Copy link

Thx for the invitation 🚀, I was able to create a minimal reproducible example: #6222

Would be great to see it fixed, soon 💞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. C-proposal Category: a proposal and request for comments M-runtime Module: tokio/runtime M-task Module: tokio/task
Projects
None yet
Development

No branches or pull requests

3 participants