`recv_async` returning Disconnected in some highly-concurrent situations even though `send` succeeds #78

ecton · 2021-04-11T16:17:28Z

I've been experiencing a situation where I feel like flume is misbehaving. On a channel, I can verify I've received an Ok(()) from a sender, and receive a disconnect error from flume on the corresponding receiver. This is the smallest example I could create.

Dependencies:

[dependencies]
tokio = { version = "1", features = ["full"] }
flume = { version = "0.10" }
futures = "0.3"

use flume::Sender;

#[tokio::main]
async fn main() {
    let tasks = (0..10).map(|_| sending_loop());

    futures::future::join_all(tasks).await;
}

async fn sending_loop() {
    for _ in 0..1000usize {
        sender_test().await
    }
}

async fn sender_test() {
    let (sender, receiver) = flume::bounded(1);

    tokio::spawn(sending(sender));

    receiver.recv_async().await.unwrap()
}

async fn sending(sender: Sender<()>) {
    sender.send(()).unwrap();
}

Running this on my machine, a 8-core/16-thread Ryzen, I regularly get the output: "thread 'main' panicked at 'called Result::unwrap() on an Err value: Disconnected', src/main.rs:21:33" which corresponds to the recv_async() line.

If this test is written to spawn a lot of tasks that call sender_test in parallel all at once, it doesn't seem to trigger the behavior. However, introducing multiple loops calling sender_test repeatedly causes the behavior change.

I couldn't figure out how to simplify this example further.

To work around the issue, you can clone the sender being passed into sending(), but this prevents actual disconnections from happening.

Update 1: It appears this was introduced between 0.9.2 and 10.0. I'm not able to reproduce this behavior with 0.9.2.

The text was updated successfully, but these errors were encountered:

zesterer · 2021-04-12T12:07:40Z

Aha! I've just gone through the diff between the two versions you mentioned and I've spotted a race condition in a patch that was merged some time after 0.9.2. It's not 'dangerous' (Flume doesn't use unsafe) but it is logically incorrect. Thanks for pointing this out to me!

I've pushed a change that I believe should fix this issue to the master branch. Could you give this example a run with 45edd4 to see whether it still occurs?

ecton · 2021-04-12T14:44:31Z

Great work! With master, I can no longer reproduce the issue in my example, and my more complicated project that was originally showcasing the issue also passes all of its tests.

Thanks for the super-fast resolution!

zesterer · 2021-04-12T15:31:12Z

Great to hear. I've released 0.10.4, which includes this patch.

zesterer closed this as completed Apr 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`recv_async` returning Disconnected in some highly-concurrent situations even though `send` succeeds #78

`recv_async` returning Disconnected in some highly-concurrent situations even though `send` succeeds #78

ecton commented Apr 11, 2021 •

edited

Loading

zesterer commented Apr 12, 2021 •

edited

Loading

ecton commented Apr 12, 2021

zesterer commented Apr 12, 2021

recv_async returning Disconnected in some highly-concurrent situations even though send succeeds #78

recv_async returning Disconnected in some highly-concurrent situations even though send succeeds #78

Comments

ecton commented Apr 11, 2021 • edited Loading

zesterer commented Apr 12, 2021 • edited Loading

ecton commented Apr 12, 2021

zesterer commented Apr 12, 2021

`recv_async` returning Disconnected in some highly-concurrent situations even though `send` succeeds #78

`recv_async` returning Disconnected in some highly-concurrent situations even though `send` succeeds #78

ecton commented Apr 11, 2021 •

edited

Loading

zesterer commented Apr 12, 2021 •

edited

Loading