Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add resubscribe functionality to broadcast::Receiver #4607

Merged
merged 13 commits into from
May 28, 2022

Conversation

estk
Copy link
Contributor

@estk estk commented Apr 8, 2022

Add subscribe method to broadcast::Receiver with the same functionality as the clone impl removed in #3020.

@github-actions github-actions bot added the R-loom Run loom tests on this PR label Apr 8, 2022
@Darksonn Darksonn added A-tokio Area: The main tokio crate M-sync Module: tokio/sync labels Apr 8, 2022
@Darksonn
Copy link
Contributor

Darksonn commented Apr 8, 2022

Why not just implement Clone in a way that isn't broken?

@uklotzde
Copy link
Contributor

uklotzde commented Apr 9, 2022

Though some kind of resubscribe method that produces an independent, fresh receiver as if we have subscribed at this moment at the sender is still a valid use case.

Not sure how to name such a method appropriately? resubscribe()?

@Darksonn
Copy link
Contributor

I would consider a Clone impl more important than such a method. If you want to discuss that further, you should probably open an issue to discuss, rather than discussing on this PR.

@estk
Copy link
Contributor Author

estk commented Apr 10, 2022

impl Clone it is

@estk
Copy link
Contributor Author

estk commented Apr 10, 2022

there go

@Darksonn
Copy link
Contributor

Can you add some tests for this?

@estk
Copy link
Contributor Author

estk commented Apr 11, 2022

@Darksonn wdyt?

@estk
Copy link
Contributor Author

estk commented Apr 14, 2022

@Darksonn in case you missed the last mention

@Darksonn
Copy link
Contributor

You are not currently updating this counter:

/// Slot in the buffer.
struct Slot<T> {
/// Remaining number of receivers that are expected to see this value.
///
/// When this goes to zero, the value is released.
///
/// An atomic is used as it is mutated concurrently with the slot read lock
/// acquired.
rem: AtomicUsize,

Currently that counter can never increase. It's unclear to me whether simply adding to it can result in some kind of race where the value is lost due to a concurrent call to recv. Additionally, if you manage to find a way to fix this, I would like to see a test that fails with the current code but doesn't fail with the fix.

@estk
Copy link
Contributor Author

estk commented Apr 18, 2022

@Darksonn thanks for the review, I guess I didn't realize what I was getting myself into 🤷‍♂️. I agree with @uklotzde that we should probably also have a Receiver::resubscribe() method, in practice it's not a problem since Sender: Clone.

I've added some tests and below I've laid out my reasoning why incrementing Slot::rem is safe. Lock-free algorithms is not something I have much experience with so I would greatly appreciate a careful review.

At first I came to the same conclusion as you, that incrementing slot.rem is not safe on account of the fact that a concurrent call to recv may drop the slot value since the fetch_add from impl Clone for Receiver may happen between fetch_sub and "drop value" in impl Drop for RecvGuard. However I now think the compiler prevents this.

We are concerned with a data-race on Slot::rem. The methods that manipulate it are:

  1. Sender::send2
  2. Receiver::clone
  3. RecvGuard::drop

Consider their interactions:

1 and 3 are pre-existing so I am going to ignore how they may affect each other.
1 and 2 are unable to manipulate the tail slot at the same time due to the tail lock. Sends which happen before or after result in predictable behavior.
2 and 3 are not protected by a mutex, we consider the interactions below.

We obtain a RecvGuard, drop then Receiver::clone: we don't care about the slot valued dropped.
We obtain a RecvGuard Receiver::clone then drop(guard): the rem has already been incremented, no drop.

Now, can RecvGuard::drop and Receiver::clone execute concurrently. No, consider the following: Receiver::recv_ref is the only method which produces a RecvGuard. Receiver::try_recv, Recv::poll, and Receiver::drop are the only methods which call recv_ref. These methods immediately drop or clone then drop the RecvGuard. All of these methods obtain a &mut Receiver before calling recv_ref and critically &mut 'receiver: 'recv_guard. Therefore Receiver::clone is prevented by rustc from being called concurrently with RecvGuard::drop.

@estk
Copy link
Contributor Author

estk commented Apr 19, 2022

working on fixing the loom test

@estk
Copy link
Contributor Author

estk commented Apr 19, 2022

@Darksonn Okay so I think I've fixed my loom test, glad I wrote that one haha. I was wondering, do you think I should write a compile_fail test that verifies my assertion that RecvGuard::drop and Receiver::clone may not be called concurrently? Would that address your concern around "some kind of race where the value is lost due to a concurrent call to recv"?

@Darksonn
Copy link
Contributor

That RecvGuard::drop can only be reached through a mutable references is enough. You don't have to write compile-fail tests for that. The compiler will definitely guarantee that mutable references are exclusive.

Unfortunately, I probably wont have time to review this in detail until next week, but I think I have been convinced that elements that the new receiver needs wont be dropped since the existing receiver also needs to see them, and can't consume them while the new receiver is being created.

@estk
Copy link
Contributor Author

estk commented Apr 30, 2022

@Darksonn bump

@Darksonn Darksonn changed the title Add subscribe method to broadcast::Receiver Add Clone to broadcast::Receiver Apr 30, 2022
@estk
Copy link
Contributor Author

estk commented May 2, 2022

@Darksonn fixed that

@carllerche
Copy link
Member

My hesitation w/ this PR and the reason I did not add Receiver::clone() initially is that clone() here is O(N) which is not the case for any other channel handle. This would be an unexpected behavior.

Because of this, if the clone behavior is required, I suggest providing it as an explicit method that can be documented as having the O(N) caveat.

@Darksonn
Copy link
Contributor

Darksonn commented May 2, 2022

I see. I didn't consider that. Going back to a resubscribe that doesn't keep the position might be a better option.

@estk
Copy link
Contributor Author

estk commented May 2, 2022

@carllerche @Darksonn It seems like both functionalities would be useful. But I also did not consider that the O(n)ness would be an issue here. Just let me know what makes sense and I'll be happy to adjust accordingly.

@estk estk changed the title Add Clone to broadcast::Receiver WIP: Add "clone" and resubscribe functionality to broadcast::Receiver May 3, 2022
@estk
Copy link
Contributor Author

estk commented May 3, 2022

Alright, I've just added both methods here for discussion. When you all decide if you'd like one or both or in separate PRs just let me know and I'll be more than happy to update the PR. Additionally please feel free to just copy what I have and adjust as necessary.

Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current thought is to just provide a resubscribe method. We may be able to fix it so we can have a cheap Clone in the future, in which case we would want a Clone impl instead.

@@ -881,6 +882,44 @@ impl<T> Receiver<T> {
}

impl<T: Clone> Receiver<T> {
/// Re-subscribes to the channel starting from the current tail element (the last element passed to `Sender::send`.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation here would need to be improved.

@estk estk changed the title WIP: Add "clone" and resubscribe functionality to broadcast::Receiver WIP: Add resubscribe functionality to broadcast::Receiver May 23, 2022
/// let mut rx2 = rx.resubscribe();
/// tx.send(2).unwrap();
///
/// assert_neq!(rx.recv().await.unwrap(), rx2.recv().await.unwrap()); // 1 != 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this clearer?

Suggested change
/// assert_neq!(rx.recv().await.unwrap(), rx2.recv().await.unwrap()); // 1 != 2
/// assert_eq!(rx2.recv().await.unwrap(), 2);
/// assert_eq!(rx.recv().await.unwrap(), 1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya agreed

@estk
Copy link
Contributor Author

estk commented May 24, 2022

@Darksonn those bsd failures seem unrelated, any idea what's going on there?

@Darksonn
Copy link
Contributor

You need to merge master to get the fixes. It's because of the 1.61 release of Rust.

@estk
Copy link
Contributor Author

estk commented May 28, 2022

@Darksonn

Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@Darksonn Darksonn enabled auto-merge (squash) May 28, 2022 19:55
@Darksonn Darksonn merged commit f6c0405 into tokio-rs:master May 28, 2022
crapStone pushed a commit to Calciumdibromid/CaBr2 that referenced this pull request Jun 11, 2022
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dependencies | minor | `1.18.2` -> `1.19.1` |
| [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dev-dependencies | minor | `1.18.2` -> `1.19.1` |

---

### Release Notes

<details>
<summary>tokio-rs/tokio</summary>

### [`v1.19.1`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.19.1)

[Compare Source](tokio-rs/tokio@tokio-1.19.0...tokio-1.19.1)

##### 1.19.1 (June 5, 2022)

This release fixes a bug in `Notified::enable`. ([#&#8203;4747])

[#&#8203;4747]: tokio-rs/tokio#4747

### [`v1.19.0`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.19.0)

[Compare Source](tokio-rs/tokio@tokio-1.18.2...tokio-1.19.0)

##### 1.19.0 (June 3, 2022)

##### Added

-   runtime: add `is_finished` method for `JoinHandle` and `AbortHandle` ([#&#8203;4709])
-   runtime: make global queue and event polling intervals configurable ([#&#8203;4671])
-   sync: add `Notified::enable` ([#&#8203;4705])
-   sync: add `watch::Sender::send_if_modified` ([#&#8203;4591])
-   sync: add resubscribe method to broadcast::Receiver ([#&#8203;4607])
-   net: add `take_error` to `TcpSocket` and `TcpStream` ([#&#8203;4739])

##### Changed

-   io: refactor out usage of Weak in the io handle ([#&#8203;4656])

##### Fixed

-   macros: avoid starvation in `join!` and `try_join!` ([#&#8203;4624])

##### Documented

-   runtime: clarify semantics of tasks outliving `block_on` ([#&#8203;4729])
-   time: fix example for `MissedTickBehavior::Burst` ([#&#8203;4713])

##### Unstable

-   metrics: correctly update atomics in `IoDriverMetrics` ([#&#8203;4725])
-   metrics: fix compilation with unstable, process, and rt, but without net ([#&#8203;4682])
-   task: add `#[track_caller]` to `JoinSet`/`JoinMap` ([#&#8203;4697])
-   task: add `Builder::{spawn_on, spawn_local_on, spawn_blocking_on}` ([#&#8203;4683])
-   task: add `consume_budget` for cooperative scheduling ([#&#8203;4498])
-   task: add `join_set::Builder` for configuring `JoinSet` tasks ([#&#8203;4687])
-   task: update return value of `JoinSet::join_one` ([#&#8203;4726])

[#&#8203;4498]: tokio-rs/tokio#4498

[#&#8203;4591]: tokio-rs/tokio#4591

[#&#8203;4607]: tokio-rs/tokio#4607

[#&#8203;4624]: tokio-rs/tokio#4624

[#&#8203;4656]: tokio-rs/tokio#4656

[#&#8203;4671]: tokio-rs/tokio#4671

[#&#8203;4682]: tokio-rs/tokio#4682

[#&#8203;4683]: tokio-rs/tokio#4683

[#&#8203;4687]: tokio-rs/tokio#4687

[#&#8203;4697]: tokio-rs/tokio#4697

[#&#8203;4705]: tokio-rs/tokio#4705

[#&#8203;4709]: tokio-rs/tokio#4709

[#&#8203;4713]: tokio-rs/tokio#4713

[#&#8203;4725]: tokio-rs/tokio#4725

[#&#8203;4726]: tokio-rs/tokio#4726

[#&#8203;4729]: tokio-rs/tokio#4729

[#&#8203;4739]: tokio-rs/tokio#4739

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [x] <!-- rebase-check -->If you want to rebase/retry this PR, click this checkbox.

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).

Co-authored-by: cabr2-bot <cabr2.help@gmail.com>
Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1394
Reviewed-by: crapStone <crapstone@noreply.codeberg.org>
Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
@jeremija
Copy link

jeremija commented Aug 10, 2023

My hesitation w/ this PR and the reason I did not add Receiver::clone() initially is that clone() here is O(N) which is not the case for any other channel handle. This would be an unexpected behavior.

Because of this, if the clone behavior is required, I suggest providing it as an explicit method that can be documented as having the O(N) caveat.

@carllerche sorry for commenting on an old thread, but would you be able to explain how would Clone be O(N) since the current implementation of resubscribe() seems to be O(1) because shared is behind an Arc:

    pub fn resubscribe(&self) -> Self {
        let shared = self.shared.clone();
        new_receiver(shared)
    }
/// Create a new `Receiver` which reads starting from the tail.
fn new_receiver<T>(shared: Arc<Shared<T>>) -> Receiver<T> {
    let mut tail = shared.tail.lock();

    if tail.rx_cnt == MAX_RECEIVERS {
        panic!("max receivers");
    }

    tail.rx_cnt = tail.rx_cnt.checked_add(1).expect("overflow");

    let next = tail.pos;

    drop(tail);

    Receiver { shared, next }
}

and I think it would be just a matter of setting next to the beginning if Clone were to be implemented. Or am I missing something here? TIA

@Darksonn
Copy link
Contributor

Each item contains a counter for the number of receivers that need to see it. You would have to increment all of those.

@jeremija
Copy link

Ah I see, thanks for explaining!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-sync Module: tokio/sync R-loom Run loom tests on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants