Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recv_async returning Disconnected in some highly-concurrent situations even though send succeeds #78

Closed
ecton opened this issue Apr 11, 2021 · 3 comments

Comments

@ecton
Copy link

ecton commented Apr 11, 2021

I've been experiencing a situation where I feel like flume is misbehaving. On a channel, I can verify I've received an Ok(()) from a sender, and receive a disconnect error from flume on the corresponding receiver. This is the smallest example I could create.

Dependencies:

[dependencies]
tokio = { version = "1", features = ["full"] }
flume = { version = "0.10" }
futures = "0.3"
use flume::Sender;

#[tokio::main]
async fn main() {
    let tasks = (0..10).map(|_| sending_loop());

    futures::future::join_all(tasks).await;
}

async fn sending_loop() {
    for _ in 0..1000usize {
        sender_test().await
    }
}

async fn sender_test() {
    let (sender, receiver) = flume::bounded(1);

    tokio::spawn(sending(sender));

    receiver.recv_async().await.unwrap()
}

async fn sending(sender: Sender<()>) {
    sender.send(()).unwrap();
}

Running this on my machine, a 8-core/16-thread Ryzen, I regularly get the output: "thread 'main' panicked at 'called Result::unwrap() on an Err value: Disconnected', src/main.rs:21:33" which corresponds to the recv_async() line.

If this test is written to spawn a lot of tasks that call sender_test in parallel all at once, it doesn't seem to trigger the behavior. However, introducing multiple loops calling sender_test repeatedly causes the behavior change.

I couldn't figure out how to simplify this example further.

To work around the issue, you can clone the sender being passed into sending(), but this prevents actual disconnections from happening.

Update 1: It appears this was introduced between 0.9.2 and 10.0. I'm not able to reproduce this behavior with 0.9.2.

@zesterer
Copy link
Owner

zesterer commented Apr 12, 2021

Aha! I've just gone through the diff between the two versions you mentioned and I've spotted a race condition in a patch that was merged some time after 0.9.2. It's not 'dangerous' (Flume doesn't use unsafe) but it is logically incorrect. Thanks for pointing this out to me!

I've pushed a change that I believe should fix this issue to the master branch. Could you give this example a run with 45edd4 to see whether it still occurs?

@ecton
Copy link
Author

ecton commented Apr 12, 2021

Great work! With master, I can no longer reproduce the issue in my example, and my more complicated project that was originally showcasing the issue also passes all of its tests.

Thanks for the super-fast resolution!

@zesterer
Copy link
Owner

Great to hear. I've released 0.10.4, which includes this patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants