New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timer: Improve memory ordering in Inner's increment #2107
Conversation
This commit improves the memory ordering in the implementation of Inner's increment function. The former code did a sequentially consistent load of self.num, then entered a loop with a sequentially consistent compare and swap on the same, bailing out with and Err only if the loaded value was MAX_TIMEOUTS. The use of SeqCst means that all threads must observe all relevant memory operations in the same order, implying synchronization between all CPUs. This commit adjusts the implementation in two key ways. First, the initial load of self.num is now down with Relaxed ordering. If two threads entered this code simultaneously, formerly, tokio required that one proceed before the other, negating their parallelism. Now, either thread may proceed without coordination. Second, the SeqCst compare_and_swap is changed to a Release, Relaxed compare_exchange_weak. The first memory ordering referrs to success: if the value is swapped the load of that value for comparison will be Relaxed and the store will be Release. The second memory ordering referrs to failure: if the value is not swapped the load is Relaxed. The _weak variant may spuriously fail but will generate better code. These changes mean that it is possible for more loops to be taken per call than strictly necessary but with greater parallelism available on this operation, improved energy consumption as CPUs don't have to coordinate as much.
This commit avoids an additional, unecessary load performed in my last commit by storing the result of the compare_exchange_weak. This is done at the cost of one additional loop when curr == MAX_TIMEOUTS but, considering how expensive loads are, this is the correct thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, with the caveat that it would be nice to see a loom
for this if possible.
@hawkw I'm interested in adding a loom test. I'm not familiar with the layout of tokio's tests. Do you know of a good example I could follow for testing something so internal as this? |
@blt you might want to look at the threadpool's internal |
@hawkw neat! Thanks for the tip. That does look useful. |
Hi all, quick update. I'm still working on this, just don't have a lot of spare time through the week this week, apparently. |
I'm still learning enough about tokio internals to not be sure I'm taking the right approach here. This commit introduces a small test to create an Inner instance -- runnable only with loom flagged on, but without actually doing loom tests yet -- and assert that no time has elapsed for the Inner. This fails like so: ``` tokio > RUSTFLAGS="--cfg loom" cargo test time::driver --lib --features "full" -- --test-threads=1 --nocapture Compiling tokio v0.2.9 (/Users/blt/projects/com/github/tokio/tokio) Finished test [unoptimized + debuginfo] target(s) in 5.43s Running /Users/blt/projects/com/github/tokio/target/debug/deps/tokio-16630467e4eb52a5 running 1 test test time::driver::tests::test_inner::sanity ... thread 'main' panicked at 'cannot access a scoped thread local variable without calling `set` first', /Users/blt/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:186:9 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace. FAILED failures: failures: time::driver::tests::test_inner::sanity test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 65 filtered out error: test failed, to rerun pass '--lib' ``` That it does is a pretty fair indication I don't understand something vital about tokio internals.
Hi all, to follow on from my last comment, I've hit a little wall. I introduced a small test in 677c829 that creates an
Any thoughts are appreciated. Further, I'm not actually sure -- once I get this going -- what it is I should test. I had originally thought I'd assert some basic inequality about the value of |
This commit introduces a new test-only function into Inner that returns the value of num. This is used to synchronize two concurrent loom threads whose sole goals are to increment and decrement in equal measure. If the atomics of this changeset are correct we'll always hit zero at the end of the model. Special care has to be taken not to decrement unless there's previously been an increment. This is the purpose of the 'synchronization' on Inner's num. Note that we always assert at the end of the model with SeqCst ordering but load in the model relaxed.
I neglected to check the last commit to see if it would build in a non-loom test environment. It did not.
@udoprog @hawkw thank you both for your reviews. I've introduced a loom test for increment/decrement in fa33129. The test spawns two loom threads and performs an equal number of increment and decrement operations, asserting at the end that an additional observer always sees a consistent zero at the end of the model's run. Because of the Also, please note that |
@blt sorry for the delay, I'm taking a look now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, modulo this question:
Because of the
debug_assert
present inInner.decrement
I had to introduce a slight synchronization in the model: the decrement operation is always done after an increment. My presumption was that this was an invariant of the calling code forInner
-- hence the debug assertion -- but if that's not true then the synchronization in my model is probably not appropriate.
I'm not intimately familiar with this code, so I can't tell you for sure if this is right or not — we'd have to ask @carllerche. My guess, though is that you're correct that the debug assertion is enforcing an invariant...
Besides that, I commented on some very minor style nits. Looking good, though!
Co-Authored-By: Eliza Weisman <eliza@buoyant.io>
53b865f
to
21ef90a
Compare
Thank you for your very careful review! |
Hi folks, ping? Please do let me know if I can do anything to improve this PR. |
Motivation
This commit improves the memory ordering in the implementation of
Inner's increment function. The former code did a sequentially
consistent load of self.num, then entered a loop with a sequentially
consistent compare and swap on the same, bailing out with and Err only
if the loaded value was MAX_TIMEOUTS. The use of SeqCst means that all
threads must observe all relevant memory operations in the same order,
implying synchronization between all CPUs.
Solution
This commit adjusts the implementation in two key ways. First, the
initial load of self.num is now done with Relaxed ordering. If two
threads entered this code simultaneously, formerly, tokio required
that one proceed before the other, negating their parallelism. Now,
either thread may proceed without coordination. Second, the SeqCst
compare_and_swap is changed to a Release, Relaxed
compare_exchange_weak. The first memory ordering refers to success:
if the value is swapped the load of that value for comparison will be
Relaxed and the store will be Release. The second memory ordering
refers to failure: if the value is not swapped the load is
Relaxed. The _weak variant may spuriously fail but will generate
better code.
These changes mean that it is possible for more loops to be taken per
call than strictly necessary but with greater parallelism available on
this operation, improved energy consumption as CPUs don't have to
coordinate as much.