Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiting for things to shut down #5585

Closed
Darksonn opened this issue Mar 28, 2023 · 24 comments · Fixed by #6033
Closed

Waiting for things to shut down #5585

Darksonn opened this issue Mar 28, 2023 · 24 comments · Fixed by #6033
Labels
A-tokio-util Area: The tokio-util crate C-feature-request Category: A feature request. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Experience needed to fix: Medium / intermediate M-sync Module: tokio/sync

Comments

@Darksonn
Copy link
Contributor

Currently, the shutdown page of the Tokio tutorial suggests using an mpsc channel to wait for shutdown. This is a bit of a hack. We should provide a dedicated alternative similar to CancellationToken.

@Darksonn Darksonn added A-tokio-util Area: The tokio-util crate M-sync Module: tokio/sync C-feature-request Category: A feature request. labels Mar 28, 2023
@Darksonn Darksonn added E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Experience needed to fix: Medium / intermediate labels Mar 28, 2023
@Finomnis
Copy link
Contributor

How would such a data structure behave? Basically the same as the cancelation token, just in reverse? Meaning you have a tree where you can clone and child each node, and then you can at any point of the tree wait for all children to finish?

@Darksonn
Copy link
Contributor Author

My idea was just to have two types, a sender and receiver (and some better naming) and have the receiver wake up when all senders are dropped.

@Finomnis
Copy link
Contributor

What would be the advantage over mpsc? Because mpsc already has this functionality. Performance?

@Darksonn
Copy link
Contributor Author

The advantage is that it would be less confusing.

@Finomnis
Copy link
Contributor

Would it make sense to integrate this functionality into CancellationToken? It will always be used in conjunction with it, and most of the information is already there. We could simply avoid ripping the tree apart at cancelation and then wait until no children exist any more

@Darksonn
Copy link
Contributor Author

It's an interesting idea. The API needs some sort of way to distinguish between senders and receivers, so it's not completely clear how to do that.

@Finomnis
Copy link
Contributor

Why that? I thought of something like . wait_for_children_dropped() or similar. And the "sender" is simply dropping the child CancellationToken.

@Finomnis
Copy link
Contributor

The assumption here is that the cancelation tokens and the joiner objects are always used in conjunction anyway, and most likely in the same tree structure, if you want to allow for partial cancelation. Which would make sense, because the CancellationTokens also allow partial cancelation. So it would make sense to combine the two and enable a cancelation token to detect when all of its children are dropped, then we would only need one object to perform both tasks.

@Darksonn
Copy link
Contributor Author

Yes, I guess that would work.

@uds5501
Copy link

uds5501 commented May 1, 2023

Hey @Finomnis , are you working on a PR for this? If not, would love to give it a try.
Is there any other ongoing discussion relevant to this issue? @Darksonn

@Finomnis
Copy link
Contributor

Finomnis commented May 1, 2023

@uds5501 No, I'm quite busy at the moment. Have fun.

@uds5501
Copy link

uds5501 commented May 3, 2023

Hey @Darksonn , I got a chance to go through the internals of Cancellation Token recently. It’s a well-written module!

I was going through currently suggested methods for graceful shutdown -

  1. If the user wants that shutdown is possible from any thread, they can init a mpsc channel, have a receiver which keeps listening for any shutdown signal while there can be multiple senders (clones of the original one) which can initiate the shut down command.
  2. A cancellation token can be used in a similar way, and it allows that a cancellation token B which is a child of token A to be cancelled separately and the parent token (A) intact.

This led to a thought, why do we need a separate API with senders and receivers?

For the receiving end, we can continue listening to the cancelled event from the base token (the very first token created)

// Borrowed from documentation
let check_for_shutdown = tokio::spawn(async move {
    tokio::select! {
        _ = base_token.cancelled() => {
            // The token was cancelled, task can shut down
        }
    }
});

Now, we can pass a clone of the base token to some other function / thread which can trigger a cancellation. Jotted down the following example to illustrate -

use tokio_util::sync::CancellationToken;
use tokio;

fn some_task(token: CancellationToken) {
    // some computation above
    token.cancel();
}

async fn base_task() {
    let base_token = CancellationToken::new();
    let check_for_shutdown = tokio::spawn(async move {
        tokio::select! {
            _ = base_token.cancelled() => {
                // handle shutdown
            }
        }
    });
    let token_to_pass = base_token.clone();
    someTask(token_to_pass);
    check_for_shutdown.await.unwrap();
}

fn main() {
    println!("Hello, world!");
    base_task();
}

This kind of feels like how we use context.WithCancel() in Go (Ref - context.WithCancel package example). (It’s perfectly okay to use this in go, not sure if passing the token around is an anti pattern in rust or not, please CMIIW).

@Finomnis
Copy link
Contributor

Finomnis commented May 3, 2023

Im not sure what you are trying to say here :) would you mind explaining how what you are saying is connected to the issue? I might just be a little dense...

@Finomnis
Copy link
Contributor

Finomnis commented May 3, 2023

The point is that it's not always feasible to await the child task, or if spawned, its joiner. So we would like to have another mechanism to propagate the information that everything that needs to shut down is now finished.

The original proposal was to introduce a new data structure, and my counter proposal was to add that capability to shutdown token itself.

Heads up: I think it's cleaner to add it to the shutdown token, but that will also be a lot more work. It took quite a while to arrive at the current state the code of the token is in.

@Finomnis
Copy link
Contributor

Finomnis commented May 3, 2023

The biggest problem we had in the original implementation is to avoid recursion, and that only works because we rip the tree apart on cancelation. So if we want to keep the tree structure during cancelation so that we can then later check whether all children got dropped, I currently don't see how we could then propagate the cancelation without recursion and keeping locks for a long duration. The current implementation is really elegant and I'm afraid adding the functionality to wait for children to be dropped will make the implementation a lot more ugly

@uds5501
Copy link

uds5501 commented May 3, 2023

@Finomnis thanks for taking the time to elaborate on the previous discussion. I wanted to clarify a couple of things regarding how async/await works.

  • If a child process triggers a token cancellation, the main thread will need to await the child process in order to receive the cancellation event on main token.
    • A small question to this - By saying not always feasible to await the child task, are you referring to the time taken to cancel all the child tokens and any other processes that child process might need to complete before completion?
  • But in mpsc's case, if a sender sends an event, the receiver can instantaneously receive the shutdown event and trigger the shutdown mechanism.

Did I understand this bit right? Thank you for your patience :D

@uds5501
Copy link

uds5501 commented May 3, 2023

So if we want to keep the tree structure during cancelation so that we can then later check whether all children got dropped

@Finomnis When we rip the tree apart, we are ensuring that the children are getting cancelled and dropped right? Why would we want to keep them attached to the parent?

@Finomnis
Copy link
Contributor

Finomnis commented May 3, 2023

@uds5501 I'm afraid you are missing the point here :/

  • If a child process triggers a token cancellation, the main thread will need to await the child process in order to receive the cancellation event on main token.

No, this is not true. If a child triggers a cancellation, it will be a partial cancellation that only stops the siblings and all the grand-children. The parent will never be notified - in fact, we don't want the parent to be notified in the case of a partial cancellation.

  • A small question to this - By saying not always feasible to await the child task, are you referring to the time taken to cancel all the child tokens and any other processes that child process might need to complete before completion?

No, propagating the cancellation is incredibly fast, almost instantaneous. What we are talking about here is the tasks that a child might have to perform after cancellation, like shutting down a socket or similar. We then want the parent to be notified when the child has finished those.

  • But in mpsc's case, if a sender sends an event, the receiver can instantaneously receive the shutdown event and trigger the shutdown mechanism.

We use mpsc here to propagate when the children are actually done shutting down; but not by sending a message. An mpsc has a mechanism that can detect when no more senders exist, and we simply have one sender in each child's scope, which drops when the child exits its function.

@Finomnis When we rip the tree apart, we are ensuring that the children are getting cancelled and dropped right? Why would we want to keep them attached to the parent?

No, cancellation tokens do not cancel a child or drop it or anything. They simply propagate the intent of a graceful cancellation to the child. How the child deals with that is its own decision.

The reason why keeping the tokens attached might be useful is because they exist inside of the child's scope, so their Drop implementation will be called when the child function actually finishes. So they could use that knowledge to notify the parent that the child is now finished with whatever it had to do to shut down.

@Finomnis
Copy link
Contributor

Finomnis commented May 3, 2023

Be warned though: Implementing this inside of CancellationToken will be a major task, it might be too much in case you aren't experienced yet.

@uds5501
Copy link

uds5501 commented May 3, 2023

No, this is not true. If a child triggers a cancellation, it will be a partial cancellation that only stops the siblings and all the grandchildren. The parent will never be notified - in fact, we don't want the parent to be notified in the case of a partial cancellation.

Apologies for the miscommunication, by child process triggering the cancellation, I meant child process canceling the parent token.

No, propagating the cancellation is incredibly fast, almost instantaneous. What we are talking about here is the tasks that a child might have to perform after cancellation, like shutting down a socket or similar. We then want the parent to be notified when the child has finished those.

Makes sense

We use mpsc here to propagate when the children are actually done shutting down; but not by sending a message. An mpsc has a mechanism that can detect when no more senders exist, and we simply have one sender in each child's scope, which drops when the child exits its function.

Again, makes sense.

No, cancellation tokens do not cancel a child or drop it or anything. They simply propagate the intent of a graceful cancellation to the child. How the child deals with that is its own decision.

I am a bit confused here, I was looking at this implementation of cancel in the treenode where cancelling a token also ensures that we are canceling the child token and dropping the child token as well. Am I misunderstanding this code snippet?

pub(crate) fn cancel(node: &Arc<TreeNode>) {
let mut locked_node = node.inner.lock().unwrap();
if locked_node.is_cancelled {
return;
}
// One by one, adopt grandchildren and then cancel and detach the child
while let Some(child) = locked_node.children.pop() {
// This can't deadlock because the mutex we are already
// holding is the parent of child.
let mut locked_child = child.inner.lock().unwrap();
// Detach the child from node
// No need to modify node.children, as the child already got removed with `.pop`
locked_child.parent = None;
locked_child.parent_idx = 0;
// If child is already cancelled, detaching is enough
if locked_child.is_cancelled {
continue;
}
// Cancel or adopt grandchildren
while let Some(grandchild) = locked_child.children.pop() {
// This can't deadlock because the two mutexes we are already
// holding is the parent and grandparent of grandchild.
let mut locked_grandchild = grandchild.inner.lock().unwrap();
// Detach the grandchild
locked_grandchild.parent = None;
locked_grandchild.parent_idx = 0;
// If grandchild is already cancelled, detaching is enough
if locked_grandchild.is_cancelled {
continue;
}
// For performance reasons, only adopt grandchildren that have children.
// Otherwise, just cancel them right away, no need for another iteration.
if locked_grandchild.children.is_empty() {
// Cancel the grandchild
locked_grandchild.is_cancelled = true;
locked_grandchild.children = Vec::new();
drop(locked_grandchild);
grandchild.waker.notify_waiters();
} else {
// Otherwise, adopt grandchild
locked_grandchild.parent = Some(node.clone());

Be warned though: Implementing this inside of CancellationToken will be a major task, it might be too much in case you aren't experienced yet.

I acknowledge that this will be my first contribution to this library so learning curve will indeed be steep. Will probably move on to another issue if this discussion doesn't come to any conclusion. Thanks again for your patience :)

@Finomnis
Copy link
Contributor

Finomnis commented May 3, 2023

@uds5501 I think we are miscommunication the word "child". I'm the context, I meant child-task, and you meant child-token. Yes, it cancels the child token, but we want to know when the child task is done.

@Finomnis
Copy link
Contributor

Finomnis commented May 3, 2023

I acknowledge that this will be my first contribution to this library so learning curve will indeed be steep. Will probably move on to another issue if this discussion doesn't come to any conclusion. Thanks again for your patience :)

Feel free to persue it :) I just wanted to make you aware of the difficilty. Didn't mean to stop you :)

@koivunej
Copy link
Contributor

koivunej commented Jun 5, 2023

I remembered this issue and "independently" came to the same conclusion as @Finomnis in #5585 (comment), basically that it would be great if CancellationToken were able to wait for it's children to be dropped.

As for the original idea by @Darksonn of just having these mspc wrapping types, I've created completion::{Completion,Barrier} which are just:

Naive `tokio::sync::mpsc::{Sender<()>,Receiver<()>}` wrappers
#[derive(Clone)]
struct Completion(tokio::sync::mpsc::Sender<()>);

#[derive(Clone)]
struct Barrier(Arc<tokio::sync::Mutex<tokio::sync::mpsc::Receiver<()>>>);

impl Barrier {
    /// Awaits until all associated Completion clones have been dropped.
    async fn wait(self) {
        self.0.lock().await.recv().await;
    }
}

fn channel() -> (Completion, Barrier) {
  let (tx, rx) = tokio::sync::mpsc::channel(1);
  let tx = Completion(tx);
  let rx = Arc::new(tokio::sync::Mutex::new(rx));
  (tx, rx)
}

There is probably a lot to optimize over the Arc<Mutex<Receiver<()>>> solution, but it's "good enough" while waiting for this feature to implemented :)

I thought of something like . wait_for_children_dropped() or similar. And the "sender" is simply dropping the child CancellationToken.

Having CancellationToken::wait_for_children_dropped() would be great.

The biggest problem we had in the original implementation is to avoid recursion, and that only works because we rip the tree apart on cancelation. So if we want to keep the tree structure during cancelation so that we can then later check whether all children got dropped, I currently don't see how we could then propagate the cancelation without recursion and keeping locks for a long duration. The current implementation is really elegant and I'm afraid adding the functionality to wait for children to be dropped will make the implementation a lot more ugly

This is a good point, however it would be great to have the CancellationToken thing provide the waitability. Perhaps I'll experiment this outside this repository first, if I have time. Hierarchical waitability is definitely more difficult to implement, and might be less required.

@LambdaP
Copy link

LambdaP commented Sep 29, 2023

This sounds a lot like what the recent crate tokio-graceful is doing. The code looks clean and not that complex. Might be worth a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio-util Area: The tokio-util crate C-feature-request Category: A feature request. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Experience needed to fix: Medium / intermediate M-sync Module: tokio/sync
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants