Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync::watch: Use Acquire/Release memory ordering instead of SeqCst #6018

Merged
merged 11 commits into from Sep 24, 2023

Conversation

uklotzde
Copy link
Contributor

@uklotzde uklotzde commented Sep 19, 2023

Motivation

SeqCst is overly restrictive.

Solution

Use Release (Sender: store / single writer) and Acquire (Receiver: load / multiple readers) memory ordering for a safe handover.

@github-actions github-actions bot added the R-loom-sync Run loom sync tests on this PR label Sep 19, 2023
@uklotzde uklotzde force-pushed the sync-watch-acquire-release-state branch from b361d8f to 9f1723b Compare September 19, 2023 20:08
@uklotzde
Copy link
Contributor Author

uklotzde commented Sep 19, 2023

I am not sure why SeqCst has been chosen here. The code comments don't tell why. With my fairly limited knowledge about memory orderings and concurrency I am confident that Release/Acquire should be sufficient for this particular use case.

@Darksonn Darksonn added A-tokio Area: The main tokio crate M-sync Module: tokio/sync labels Sep 19, 2023
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not particularly familiar with the watch channel internals, but my assumption is that SeqCst is used here to form a total order between the state, the ref count, and the atomics used in Notify and RwLock. If the loom tests pass after making this change, though, it's possible that such a total ordering is not actually necessary to ensure correctness.

However, I would recommend also running the sync_watch.rs benchmarks before and after this change, preferably on an ARM machine (or other architecture with weak memory ordering --- on x86, all memory operations are, essentially, implicitly Acquire/Release). While SeqCst operations have a performance cost, they can also have a performance benefit in some cases --- sometimes, a more expensive SeqCst operation is actually cheaper if it succeeds on the first try and doesn't require some operation to be retried multiple times until the state of multiple atomics becomes consistent. So, it's usually worthwhile to do some before/after benchmarking when making these kind of changes.

Thanks for the PR! I hope my comment gives you some useful context for this change! :)

tokio/src/sync/watch.rs Outdated Show resolved Hide resolved
@uklotzde
Copy link
Contributor Author

While SeqCst operations have a performance cost, they can also have a performance benefit in some cases --- sometimes, a more expensive SeqCst operation is actually cheaper if it succeeds on the first try and doesn't require some operation to be retried multiple times until the state of multiple atomics becomes consistent. So, it's usually worthwhile to do some before/after benchmarking when making these kind of changes.

Thank you for these insights! I wasn't aware of these possible side effects. The version is always incremented within an exclusive lock scope. Only the CLOSED bit could be set at any time. But for the latter the timing doesn't really matter, because for the related refcount only Relaxed ordering is used. These two are only eventually consistent, even with SeqCst ordering.

Unfortunately, I don't have access to an ARM machine locally so I can't run the benchmarks myself. I am aware that on x86 the proposed change has no effect.

I barely use SeqCst myself when it is possible to separate the reader/writer roles. Not only for performance but also to explicitly document the runtime dependencies.

@uklotzde uklotzde force-pushed the sync-watch-acquire-release-state branch from 5d67b2c to e5daae7 Compare September 19, 2023 21:12
@hawkw
Copy link
Member

hawkw commented Sep 19, 2023

Yeah, that all makes sense. I'm certainly not at all opposed to this change --- I was just curious about whether the benchmarks had been run. If loom likes it, I would happily approve this PR. But, it would be nice to get someone more familiar with the watch implementation to take a look.

tokio/src/sync/watch.rs Outdated Show resolved Hide resolved
tokio/src/sync/watch.rs Outdated Show resolved Hide resolved
tokio/src/sync/watch.rs Outdated Show resolved Hide resolved
tokio/src/sync/watch.rs Show resolved Hide resolved
Comment on lines 443 to 446
// Use `Release` ordering to ensure that storing the version
// state is seen by the receiver side that uses `Acquire` for
// loading the state.
self.0.fetch_or(CLOSED_BIT, Ordering::Release);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, it would arguably make sense to use something stronger than relaxed for ref_count_rx. Currently this assert could fail:

// thread 1
GLOBAL_BOOL.store(true, Relaxed);
drop(receiver);

// thread 2
if sender.is_closed() {
    assert!(GLOBAL_BOOL.load(Relaxed));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the sender and all receivers could be dropped independently in any order by different threads. I can't imagine how to link the sender-side closed state (refcount) with the receiver-side closed state (version bit) when using separate atomics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, it's a separate question from what you're fixing here.

@Darksonn Darksonn added the R-loom Run loom tests on this PR label Sep 20, 2023
tokio/src/sync/watch.rs Outdated Show resolved Hide resolved
tokio/src/sync/watch.rs Outdated Show resolved Hide resolved
uklotzde and others added 3 commits September 20, 2023 14:05
@Darksonn Darksonn changed the title sync::watch: Use Acquire/Release memory ordering instead of SeqCst sync::watch: Use Acquire/Release memory ordering instead of SeqCst Sep 21, 2023
Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just one nit.

tokio/src/sync/watch.rs Outdated Show resolved Hide resolved
@Darksonn Darksonn merged commit 453c720 into tokio-rs:master Sep 24, 2023
69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-sync Module: tokio/sync R-loom Run loom tests on this PR R-loom-sync Run loom sync tests on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants