timer: Improve memory ordering in Inner's increment #2107

blt · 2020-01-13T04:52:26Z

Motivation

This commit improves the memory ordering in the implementation of
Inner's increment function. The former code did a sequentially
consistent load of self.num, then entered a loop with a sequentially
consistent compare and swap on the same, bailing out with and Err only
if the loaded value was MAX_TIMEOUTS. The use of SeqCst means that all
threads must observe all relevant memory operations in the same order,
implying synchronization between all CPUs.

Solution

This commit adjusts the implementation in two key ways. First, the
initial load of self.num is now done with Relaxed ordering. If two
threads entered this code simultaneously, formerly, tokio required
that one proceed before the other, negating their parallelism. Now,
either thread may proceed without coordination. Second, the SeqCst
compare_and_swap is changed to a Release, Relaxed
compare_exchange_weak. The first memory ordering refers to success:
if the value is swapped the load of that value for comparison will be
Relaxed and the store will be Release. The second memory ordering
refers to failure: if the value is not swapped the load is
Relaxed. The _weak variant may spuriously fail but will generate
better code.

These changes mean that it is possible for more loops to be taken per
call than strictly necessary but with greater parallelism available on
this operation, improved energy consumption as CPUs don't have to
coordinate as much.

This commit improves the memory ordering in the implementation of Inner's increment function. The former code did a sequentially consistent load of self.num, then entered a loop with a sequentially consistent compare and swap on the same, bailing out with and Err only if the loaded value was MAX_TIMEOUTS. The use of SeqCst means that all threads must observe all relevant memory operations in the same order, implying synchronization between all CPUs. This commit adjusts the implementation in two key ways. First, the initial load of self.num is now down with Relaxed ordering. If two threads entered this code simultaneously, formerly, tokio required that one proceed before the other, negating their parallelism. Now, either thread may proceed without coordination. Second, the SeqCst compare_and_swap is changed to a Release, Relaxed compare_exchange_weak. The first memory ordering referrs to success: if the value is swapped the load of that value for comparison will be Relaxed and the store will be Release. The second memory ordering referrs to failure: if the value is not swapped the load is Relaxed. The _weak variant may spuriously fail but will generate better code. These changes mean that it is possible for more loops to be taken per call than strictly necessary but with greater parallelism available on this operation, improved energy consumption as CPUs don't have to coordinate as much.

tokio/src/time/driver/mod.rs

This commit avoids an additional, unecessary load performed in my last commit by storing the result of the compare_exchange_weak. This is done at the cost of one additional loop when curr == MAX_TIMEOUTS but, considering how expensive loads are, this is the correct thing to do.

hawkw

This looks good to me, with the caveat that it would be nice to see a loom for this if possible.

blt · 2020-01-13T23:04:35Z

@hawkw I'm interested in adding a loom test. I'm not familiar with the layout of tokio's tests. Do you know of a good example I could follow for testing something so internal as this?

hawkw · 2020-01-13T23:44:17Z

@blt you might want to look at the threadpool's internal loom tests for it's work queue implementation: https://github.com/tokio-rs/tokio/blob/8546ff826db8dba1e39b4119ad909fb6cab2492a/tokio/src/runtime/thread_pool/tests/loom_queue.rs

blt · 2020-01-13T23:45:30Z

@hawkw neat! Thanks for the tip. That does look useful.

blt · 2020-01-17T15:22:48Z

Hi all, quick update. I'm still working on this, just don't have a lot of spare time through the week this week, apparently.

I'm still learning enough about tokio internals to not be sure I'm taking the right approach here. This commit introduces a small test to create an Inner instance -- runnable only with loom flagged on, but without actually doing loom tests yet -- and assert that no time has elapsed for the Inner. This fails like so: ``` tokio > RUSTFLAGS="--cfg loom" cargo test time::driver --lib --features "full" -- --test-threads=1 --nocapture Compiling tokio v0.2.9 (/Users/blt/projects/com/github/tokio/tokio) Finished test [unoptimized + debuginfo] target(s) in 5.43s Running /Users/blt/projects/com/github/tokio/target/debug/deps/tokio-16630467e4eb52a5 running 1 test test time::driver::tests::test_inner::sanity ... thread 'main' panicked at 'cannot access a scoped thread local variable without calling `set` first', /Users/blt/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:186:9 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace. FAILED failures: failures: time::driver::tests::test_inner::sanity test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 65 filtered out error: test failed, to rerun pass '--lib' ``` That it does is a pretty fair indication I don't understand something vital about tokio internals.

blt · 2020-01-17T15:46:12Z

Hi all, to follow on from my last comment, I've hit a little wall. I introduced a small test in 677c829 that creates an Inner and then attempts to assert its elapsed value is 0, just a little thing to demonstrate I can create the types I need to create and run the test. Well, I'm missing some vital understanding as the test blows up in a way I can't quite figure out:

tokio > RUSTFLAGS="--cfg loom" cargo test time::driver --lib  --features "full" -- --test-threads=1 --nocapture
   Compiling tokio v0.2.9 (/Users/blt/projects/com/github/tokio/tokio)
    Finished test [unoptimized + debuginfo] target(s) in 5.43s
     Running /Users/blt/projects/com/github/tokio/target/debug/deps/tokio-16630467e4eb52a5

running 1 test
test time::driver::tests::test_inner::sanity ... thread 'main' panicked at 'cannot access a scoped thread local variable without calling `set` first', /Users/blt/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:186:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
FAILED

failures:

failures:
    time::driver::tests::test_inner::sanity

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 65 filtered out

error: test failed, to rerun pass '--lib'

Any thoughts are appreciated. Further, I'm not actually sure -- once I get this going -- what it is I should test. I had originally thought I'd assert some basic inequality about the value of self.num with two loom threads concurrently calling decrement and increment but I realize that's shot now what with decrement wrapping around. Playing around with methods on Inner, then, is maybe not the right level of abstraction to be modeling? My understanding of tokio internals is still pretty poor, so pointers here would also be appreciated.

tokio/src/time/driver/tests/test_inner.rs

This commit introduces a new test-only function into Inner that returns the value of num. This is used to synchronize two concurrent loom threads whose sole goals are to increment and decrement in equal measure. If the atomics of this changeset are correct we'll always hit zero at the end of the model. Special care has to be taken not to decrement unless there's previously been an increment. This is the purpose of the 'synchronization' on Inner's num. Note that we always assert at the end of the model with SeqCst ordering but load in the model relaxed.

I neglected to check the last commit to see if it would build in a non-loom test environment. It did not.

blt · 2020-01-22T03:36:14Z

@udoprog @hawkw thank you both for your reviews. I've introduced a loom test for increment/decrement in fa33129. The test spawns two loom threads and performs an equal number of increment and decrement operations, asserting at the end that an additional observer always sees a consistent zero at the end of the model's run. Because of the debug_assert present in Inner.decrement I had to introduce a slight synchronization in the model: the decrement operation is always done after an increment. My presumption was that this was an invariant of the calling code for Inner -- hence the debug assertion -- but if that's not true then the synchronization in my model is probably not appropriate.

Also, please note that Inner.decrement now operates on Acquire ordering. The load of self.num is always ordered after the Release store done by increment.

blt · 2020-01-28T16:07:32Z

@hawkw @udoprog hi folks, just a friendly ping about this PR. Please do let me know if I can make any more changes to improve it.

hawkw · 2020-01-28T18:40:28Z

@blt sorry for the delay, I'm taking a look now!

hawkw

This looks good to me, modulo this question:

Because of the debug_assert present in Inner.decrement I had to introduce a slight synchronization in the model: the decrement operation is always done after an increment. My presumption was that this was an invariant of the calling code for Inner -- hence the debug assertion -- but if that's not true then the synchronization in my model is probably not appropriate.

I'm not intimately familiar with this code, so I can't tell you for sure if this is right or not — we'd have to ask @carllerche. My guess, though is that you're correct that the debug assertion is enforcing an invariant...

Besides that, I commented on some very minor style nits. Looking good, though!

tokio/src/time/driver/mod.rs

tokio/src/time/driver/tests/mod.rs

tokio/src/time/driver/tests/test_inner.rs

Co-Authored-By: Eliza Weisman <eliza@buoyant.io>

Resolves tokio-rs#2107 (comment)

blt · 2020-01-29T05:54:35Z

Besides that, I commented on some very minor style nits. Looking good, though!

Thank you for your very careful review!

blt · 2020-02-28T16:27:37Z

Hi folks, ping? Please do let me know if I can do anything to improve this PR.

udoprog reviewed Jan 13, 2020

View reviewed changes

tokio/src/time/driver/mod.rs Outdated Show resolved Hide resolved

hawkw approved these changes Jan 13, 2020

View reviewed changes

hawkw reviewed Jan 17, 2020

View reviewed changes

tokio/src/time/driver/tests/test_inner.rs Outdated Show resolved Hide resolved

blt added 2 commits January 21, 2020 19:11

Address unused import build error

d7f07b9

I neglected to check the last commit to see if it would build in a non-loom test environment. It did not.

Merge branch 'master' into blt-inner_atomics

a58614e

hawkw approved these changes Jan 28, 2020

View reviewed changes

tokio/src/time/driver/mod.rs Outdated Show resolved Hide resolved

tokio/src/time/driver/mod.rs Outdated Show resolved Hide resolved

tokio/src/time/driver/tests/mod.rs Outdated Show resolved Hide resolved

tokio/src/time/driver/tests/test_inner.rs Outdated Show resolved Hide resolved

Brian L. Troutwine and others added 3 commits January 28, 2020 11:23

Update tokio/src/time/driver/mod.rs

ee9fddf

Co-Authored-By: Eliza Weisman <eliza@buoyant.io>

Unstack test/loom cfg

20d01c6

Resolves tokio-rs#2107 (comment)

Remove teh rust_2018_idioms duplication

5e1fd00

Resolves tokio-rs#2107 (comment)

blt added a commit to blt/tokio that referenced this pull request Jan 29, 2020

Collapse test_inner upward

53b865f

Resolves tokio-rs#2107 (comment)

Collapse test_inner upward

21ef90a

Resolves tokio-rs#2107 (comment)

blt force-pushed the blt-inner_atomics branch from 53b865f to 21ef90a Compare January 29, 2020 05:29

carllerche merged commit 3fb213a into tokio-rs:master Mar 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timer: Improve memory ordering in Inner's increment #2107

timer: Improve memory ordering in Inner's increment #2107

blt commented Jan 13, 2020 •

edited

hawkw left a comment

blt commented Jan 13, 2020 •

edited

hawkw commented Jan 13, 2020

blt commented Jan 13, 2020

blt commented Jan 17, 2020

blt commented Jan 17, 2020 •

edited

blt commented Jan 22, 2020

blt commented Jan 28, 2020

hawkw commented Jan 28, 2020

hawkw left a comment

blt commented Jan 29, 2020

blt commented Feb 28, 2020

timer: Improve memory ordering in Inner's increment #2107

timer: Improve memory ordering in Inner's increment #2107

Conversation

blt commented Jan 13, 2020 • edited

Motivation

Solution

hawkw left a comment

Choose a reason for hiding this comment

blt commented Jan 13, 2020 • edited

hawkw commented Jan 13, 2020

blt commented Jan 13, 2020

blt commented Jan 17, 2020

blt commented Jan 17, 2020 • edited

blt commented Jan 22, 2020

blt commented Jan 28, 2020

hawkw commented Jan 28, 2020

hawkw left a comment

Choose a reason for hiding this comment

blt commented Jan 29, 2020

blt commented Feb 28, 2020

blt commented Jan 13, 2020 •

edited

blt commented Jan 13, 2020 •

edited

blt commented Jan 17, 2020 •

edited