sync: refactored `PollSender<T>` to fix a subtly broken `Sink<T>` implementation #4214

tobz · 2021-11-03T03:09:46Z

Motivation

The existing PollSender<T> implementation uses a simplified design that treats poll_send_done as both poll_ready and poll_flush. However, it only ever actually does any flushing work i.e. driving the future which sends into the underlying channel. When used in poll_ready, it simply returns that the sender is always ready.

This leads to a subtle bug where callers could inadvertently use something like sink.send_all(..).await which leads to a panic. Combinators like send_all believe they can continue to call start_send so long as poll_ready returns true when called directly before. However, in the current design, if the underlying channel was full, the next call to start_send -- after poll_ready claimed the sender was ready -- could hit a scenario where the pending send was not yet complete, which would leave self.is_sending as true, causing the next call to start_send to hit the check for if self.is_sending is true, thus leading to a panic.

Solution

This PR explores a refactored design where instead of driving a channel send future, we drive a future for reserving a permit for sending to the channel. This moves the ordering of logic from poll_send_done/start_send/poll_send_done to simply poll_reserve/start_send.

This allows poll_ready to actually be representative of whether or not an item can be sent into the channel. Additionally, through the state machine approach, we can eliminate most clones of the Sender itself by recapturing the underlying sender from the permit.

This does change a few methods and some naming so it would be a breaking change, although I think it's worth it for the ability to correctly provide the Sink<T> implementation.

Signed-off-by: Toby Lawrence toby@nuclearfurnace.com

…ntation Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

tokio-util/CHANGELOG.md

Darksonn · 2021-11-03T08:31:32Z

tokio-util/src/sync/mpsc.rs

+    /// will panic.
+    pub fn start_send(&mut self, value: T) -> Result<(), PollSendError<T>> {
+        let (result, next_state) = match self.take_state() {
+            State::Idle(_) | State::Acquiring => panic!("`start_send` called without first calling `poll_ready`"),


We could make the choice to have start_send succeed in this case, attempting to use the send method instead?

The only viable approach I currently see to make this method more resilient is that we could attempt a send via Sender::try_send. We'd then be able to change the language in the documentation to something more like:

If poll_reserve is called prior to calling start_send, and returns Poll::Ready(Ok(())), then the call to start_send is guaranteed to succeed. If it is not called prior, then a send is attempted but may or may not succeed.

The biggest issue then would be how to document the error case: PollSendError<T> is meant to only ever indicate that the channel is closed, but in this case, try_send might fail due to the channel being closed or full. It feels weird to differentiate errors based on the condition of whether or not poll_reserve was called since there is supposed to be a formal contract of calling poll_reserve before start_send.

Long story short, I'm willing to make such a change, just curious what your thoughts are on how to best document the behavior difference depending on if poll_reserve was called first or not.

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

tokio-util/src/sync/mpsc.rs

Darksonn · 2021-11-09T14:41:08Z

tokio-util/src/sync/mpsc.rs

-    pub fn clone_inner(&self) -> Option<Sender<T>> {
-        self.sender.as_ref().map(|sender| (&**sender).clone())
+    /// The underlying channel that this sender was wrapping may still be open.
+    pub fn is_closed(&self) -> bool {


You don't want to check whether the underlying channel is closed here?

is_closed only cares about whether or not this PollSender is closed, if a user can expect to be able to execute more sends in the future.

Admittedly, there is the corner case here of when we close but already have a sender slot permit, because a call to poll_reserve would still show as ready, and start_send could be used to execute that send.

I don't think the current state of this method is great.

I think most of the feedback has revolved around the lengths the PR goes to in order to attempt to preserve an acquired sending slot, so let me ask: should we actually bother trying to hold an acquired sending slot?

If we get rid of that functionality, things get much simpler. I don't have a strong opinion on keeping that behavior, and I'm not sure that it provides more flexibility compared to simply avoiding closing the sender before you're done actually sending.

Darksonn · 2021-11-09T14:41:57Z

tokio-util/src/sync/mpsc.rs

+    /// If a slot was previously reserved by calling `poll_reserve`, then a final call can be made
+    /// to `start_send` in order to consume the reserved slot.  After that, no further sends will be
+    /// possible.  If you do not intend to send another item, you can release the reserved slot back
+    /// to the underlying sender by calling [`abort_send`].


This seems like a footgun.

Is the "footgun" aspect you're referring to specifically about how PollSender can keep a Receiver open (when PollSender is in the ReadyToSend state) even when all other Sender references are gone? I was mostly trying to follow the previous behavior where an in-flight "operation" could be finished cleanly even after closing the PollSender.

If you think we should abandon that behavior, I'm happy to make close be more forceful.

Regardless of the behavior of this method, I think that Sink::poll_close should properly close it.

As for in-flight operations, it's true that we have a similar API in Tokio, but it's on the receiver and necessary there because of races, but that race isn't present for senders, so its not as necessary.

But what is the proper way then? Making any subsequent call to poll_ready/start_send/poll_flush not do anything?

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

tokio-util/src/sync/mpsc.rs

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

In v0.7 of `tokio-util`, the `PollSender` API was changed to have the semantics that most users expect (see tokio-rs/tokio#4214). The `poll_send_done` method was replaced with a `poll_reserve` method, and the implementation rewritten to drive a `Sender::reserve_owned` future that's consumed in `send_item`. Now, `PollSender` essentially implements exactly the same code that was written by hand in the consensus service. We can simplify the consensus service significantly by upgrading the `tokio-util` dependency to 0.7 and replacing the hand-written version with `PollSender`. There should be no functional change as a result of this refactor.

tobz added 2 commits November 2, 2021 22:56

sync: refactored PollSender<T> to fix a subtly broken Sink<T> impleme…

72b0d08

…ntation Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

small comment tweaks + more thorough assertions in one test

885b81d

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

Darksonn reviewed Nov 3, 2021

View reviewed changes

tobz added 6 commits November 3, 2021 08:26

address some feedback, add another test

105c02c

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

make poll_reserve more misuse-resistent + more tests

35d0a22

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

more cleanup + tests

ad4e690

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

pub export for PollSendError<T>

4519da8

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

formatting

3fac48a

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

drop unecessary bound

2e8202f

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

tobz mentioned this pull request Nov 4, 2021

chore(buffers): new BufferSender<T>/BufferReceiver<T> + buffer topology builder vectordotdev/vector#9915

Merged

Darksonn reviewed Nov 9, 2021

View reviewed changes

update doc comments per PR feedback

eeb4abd

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

tobz added A-tokio-util Area: The tokio-util crate S-breaking-change A breaking change that requires manual coordination to be released. labels Dec 6, 2021

Darksonn reviewed Dec 10, 2021

View reviewed changes

tokio-util/src/sync/mpsc.rs Outdated Show resolved Hide resolved

tokio-util/src/sync/mpsc.rs Outdated Show resolved Hide resolved

tobz added 2 commits December 10, 2021 10:56

Merge branch 'master' into tobz/fix-poll-sender

4905e73

slight naming tweaks

04413d6

Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com>

github-actions bot added the R-loom Run loom tests on this PR label Dec 10, 2021

Darksonn approved these changes Feb 9, 2022

View reviewed changes

Merge branch 'master' into tobz/fix-poll-sender

b584b70

tobz merged commit 52fb93d into master Feb 9, 2022

tobz deleted the tobz/fix-poll-sender branch February 9, 2022 17:09

tobz mentioned this pull request Feb 10, 2022

chore: prepare tokio-util 0.7.0 #4486

Merged

hawkw mentioned this pull request Feb 15, 2022

pd: use tokio_util::sync::PollSender in consensus svc penumbra-zone/penumbra#436

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync: refactored `PollSender<T>` to fix a subtly broken `Sink<T>` implementation #4214

sync: refactored `PollSender<T>` to fix a subtly broken `Sink<T>` implementation #4214

tobz commented Nov 3, 2021

Darksonn Nov 3, 2021

tobz Nov 3, 2021

Darksonn Nov 9, 2021

tobz Nov 9, 2021

Darksonn Dec 10, 2021

tobz Dec 10, 2021 •

edited

Darksonn Nov 9, 2021

tobz Nov 9, 2021 •

edited

Darksonn Dec 10, 2021

tobz Dec 10, 2021

sync: refactored PollSender<T> to fix a subtly broken Sink<T> implementation #4214

sync: refactored PollSender<T> to fix a subtly broken Sink<T> implementation #4214

Conversation

tobz commented Nov 3, 2021

Motivation

Solution

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tobz Dec 10, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tobz Nov 9, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sync: refactored `PollSender<T>` to fix a subtly broken `Sink<T>` implementation #4214

sync: refactored `PollSender<T>` to fix a subtly broken `Sink<T>` implementation #4214

tobz Dec 10, 2021 •

edited

tobz Nov 9, 2021 •

edited