Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timer: Improve memory ordering in Inner's increment #2107

Merged
merged 10 commits into from Mar 26, 2020

Conversation

blt
Copy link
Contributor

@blt blt commented Jan 13, 2020

Motivation

This commit improves the memory ordering in the implementation of
Inner's increment function. The former code did a sequentially
consistent load of self.num, then entered a loop with a sequentially
consistent compare and swap on the same, bailing out with and Err only
if the loaded value was MAX_TIMEOUTS. The use of SeqCst means that all
threads must observe all relevant memory operations in the same order,
implying synchronization between all CPUs.

Solution

This commit adjusts the implementation in two key ways. First, the
initial load of self.num is now done with Relaxed ordering. If two
threads entered this code simultaneously, formerly, tokio required
that one proceed before the other, negating their parallelism. Now,
either thread may proceed without coordination. Second, the SeqCst
compare_and_swap is changed to a Release, Relaxed
compare_exchange_weak. The first memory ordering refers to success:
if the value is swapped the load of that value for comparison will be
Relaxed and the store will be Release. The second memory ordering
refers to failure: if the value is not swapped the load is
Relaxed. The _weak variant may spuriously fail but will generate
better code.

These changes mean that it is possible for more loops to be taken per
call than strictly necessary but with greater parallelism available on
this operation, improved energy consumption as CPUs don't have to
coordinate as much.

This commit improves the memory ordering in the implementation of
Inner's increment function. The former code did a sequentially
consistent load of self.num, then entered a loop with a sequentially
consistent compare and swap on the same, bailing out with and Err only
if the loaded value was MAX_TIMEOUTS. The use of SeqCst means that all
threads must observe all relevant memory operations in the same order,
implying synchronization between all CPUs.

This commit adjusts the implementation in two key ways. First, the
initial load of self.num is now down with Relaxed ordering. If two
threads entered this code simultaneously, formerly, tokio required
that one proceed before the other, negating their parallelism. Now,
either thread may proceed without coordination. Second, the SeqCst
compare_and_swap is changed to a Release, Relaxed
compare_exchange_weak. The first memory ordering referrs to success:
if the value is swapped the load of that value for comparison will be
Relaxed and the store will be Release. The second memory ordering
referrs to failure: if the value is not swapped the load is
Relaxed. The _weak variant may spuriously fail but will generate
better code.

These changes mean that it is possible for more loops to be taken per
call than strictly necessary but with greater parallelism available on
this operation, improved energy consumption as CPUs don't have to
coordinate as much.
This commit avoids an additional, unecessary load performed in my last
commit by storing the result of the compare_exchange_weak. This is
done at the cost of one additional loop when curr == MAX_TIMEOUTS but,
considering how expensive loads are, this is the correct thing to do.
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, with the caveat that it would be nice to see a loom for this if possible.

@blt
Copy link
Contributor Author

blt commented Jan 13, 2020

@hawkw I'm interested in adding a loom test. I'm not familiar with the layout of tokio's tests. Do you know of a good example I could follow for testing something so internal as this?

@hawkw
Copy link
Member

hawkw commented Jan 13, 2020

@blt you might want to look at the threadpool's internal loom tests for it's work queue implementation: https://github.com/tokio-rs/tokio/blob/8546ff826db8dba1e39b4119ad909fb6cab2492a/tokio/src/runtime/thread_pool/tests/loom_queue.rs

@blt
Copy link
Contributor Author

blt commented Jan 13, 2020

@hawkw neat! Thanks for the tip. That does look useful.

@blt
Copy link
Contributor Author

blt commented Jan 17, 2020

Hi all, quick update. I'm still working on this, just don't have a lot of spare time through the week this week, apparently.

I'm still learning enough about tokio internals to not be sure
I'm taking the right approach here. This commit introduces a small
test to create an Inner instance -- runnable only with loom flagged
on, but without actually doing loom tests yet -- and assert that
no time has elapsed for the Inner. This fails like so:

```
tokio > RUSTFLAGS="--cfg loom" cargo test time::driver --lib  --features "full" -- --test-threads=1 --nocapture
   Compiling tokio v0.2.9 (/Users/blt/projects/com/github/tokio/tokio)
    Finished test [unoptimized + debuginfo] target(s) in 5.43s
     Running /Users/blt/projects/com/github/tokio/target/debug/deps/tokio-16630467e4eb52a5

running 1 test
test time::driver::tests::test_inner::sanity ... thread 'main' panicked at 'cannot access a scoped thread local variable without calling `set` first', /Users/blt/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:186:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
FAILED

failures:

failures:
    time::driver::tests::test_inner::sanity

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 65 filtered out

error: test failed, to rerun pass '--lib'
```

That it does is a pretty fair indication I don't understand something
vital about tokio internals.
@blt
Copy link
Contributor Author

blt commented Jan 17, 2020

Hi all, to follow on from my last comment, I've hit a little wall. I introduced a small test in 677c829 that creates an Inner and then attempts to assert its elapsed value is 0, just a little thing to demonstrate I can create the types I need to create and run the test. Well, I'm missing some vital understanding as the test blows up in a way I can't quite figure out:

tokio > RUSTFLAGS="--cfg loom" cargo test time::driver --lib  --features "full" -- --test-threads=1 --nocapture
   Compiling tokio v0.2.9 (/Users/blt/projects/com/github/tokio/tokio)
    Finished test [unoptimized + debuginfo] target(s) in 5.43s
     Running /Users/blt/projects/com/github/tokio/target/debug/deps/tokio-16630467e4eb52a5

running 1 test
test time::driver::tests::test_inner::sanity ... thread 'main' panicked at 'cannot access a scoped thread local variable without calling `set` first', /Users/blt/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:186:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
FAILED

failures:

failures:
    time::driver::tests::test_inner::sanity

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 65 filtered out

error: test failed, to rerun pass '--lib'

Any thoughts are appreciated. Further, I'm not actually sure -- once I get this going -- what it is I should test. I had originally thought I'd assert some basic inequality about the value of self.num with two loom threads concurrently calling decrement and increment but I realize that's shot now what with decrement wrapping around. Playing around with methods on Inner, then, is maybe not the right level of abstraction to be modeling? My understanding of tokio internals is still pretty poor, so pointers here would also be appreciated.

This commit introduces a new test-only function into Inner that
returns the value of num. This is used to synchronize two concurrent
loom threads whose sole	goals are to increment and decrement in
equal measure. If the atomics of this changeset	are correct we'll
always hit zero	at the end of the model.

Special	care has to be taken not to decrement unless there's
previously been	an increment. This is the purpose of the
'synchronization' on Inner's num. Note that we always assert at the
end of the model with SeqCst ordering but load in the model relaxed.
I neglected to check the last commit to see if it would build in a
non-loom test environment. It did not.
@blt
Copy link
Contributor Author

blt commented Jan 22, 2020

@udoprog @hawkw thank you both for your reviews. I've introduced a loom test for increment/decrement in fa33129. The test spawns two loom threads and performs an equal number of increment and decrement operations, asserting at the end that an additional observer always sees a consistent zero at the end of the model's run. Because of the debug_assert present in Inner.decrement I had to introduce a slight synchronization in the model: the decrement operation is always done after an increment. My presumption was that this was an invariant of the calling code for Inner -- hence the debug assertion -- but if that's not true then the synchronization in my model is probably not appropriate.

Also, please note that Inner.decrement now operates on Acquire ordering. The load of self.num is always ordered after the Release store done by increment.

@blt
Copy link
Contributor Author

blt commented Jan 28, 2020

@hawkw @udoprog hi folks, just a friendly ping about this PR. Please do let me know if I can make any more changes to improve it.

@hawkw
Copy link
Member

hawkw commented Jan 28, 2020

@blt sorry for the delay, I'm taking a look now!

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, modulo this question:

Because of the debug_assert present in Inner.decrement I had to introduce a slight synchronization in the model: the decrement operation is always done after an increment. My presumption was that this was an invariant of the calling code for Inner -- hence the debug assertion -- but if that's not true then the synchronization in my model is probably not appropriate.

I'm not intimately familiar with this code, so I can't tell you for sure if this is right or not — we'd have to ask @carllerche. My guess, though is that you're correct that the debug assertion is enforcing an invariant...

Besides that, I commented on some very minor style nits. Looking good, though!

tokio/src/time/driver/mod.rs Outdated Show resolved Hide resolved
tokio/src/time/driver/mod.rs Outdated Show resolved Hide resolved
tokio/src/time/driver/tests/mod.rs Outdated Show resolved Hide resolved
tokio/src/time/driver/tests/test_inner.rs Outdated Show resolved Hide resolved
Brian L. Troutwine and others added 3 commits January 28, 2020 11:23
blt added a commit to blt/tokio that referenced this pull request Jan 29, 2020
@blt
Copy link
Contributor Author

blt commented Jan 29, 2020

Besides that, I commented on some very minor style nits. Looking good, though!

Thank you for your very careful review!

@blt
Copy link
Contributor Author

blt commented Feb 28, 2020

Hi folks, ping? Please do let me know if I can do anything to improve this PR.

@carllerche carllerche merged commit 3fb213a into tokio-rs:master Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants