Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timer: Improve memory ordering in Inner's increment #2107

Open
wants to merge 5 commits into
base: master
from

Conversation

@blt
Copy link

blt commented Jan 13, 2020

Motivation

This commit improves the memory ordering in the implementation of
Inner's increment function. The former code did a sequentially
consistent load of self.num, then entered a loop with a sequentially
consistent compare and swap on the same, bailing out with and Err only
if the loaded value was MAX_TIMEOUTS. The use of SeqCst means that all
threads must observe all relevant memory operations in the same order,
implying synchronization between all CPUs.

Solution

This commit adjusts the implementation in two key ways. First, the
initial load of self.num is now done with Relaxed ordering. If two
threads entered this code simultaneously, formerly, tokio required
that one proceed before the other, negating their parallelism. Now,
either thread may proceed without coordination. Second, the SeqCst
compare_and_swap is changed to a Release, Relaxed
compare_exchange_weak. The first memory ordering refers to success:
if the value is swapped the load of that value for comparison will be
Relaxed and the store will be Release. The second memory ordering
refers to failure: if the value is not swapped the load is
Relaxed. The _weak variant may spuriously fail but will generate
better code.

These changes mean that it is possible for more loops to be taken per
call than strictly necessary but with greater parallelism available on
this operation, improved energy consumption as CPUs don't have to
coordinate as much.

This commit improves the memory ordering in the implementation of
Inner's increment function. The former code did a sequentially
consistent load of self.num, then entered a loop with a sequentially
consistent compare and swap on the same, bailing out with and Err only
if the loaded value was MAX_TIMEOUTS. The use of SeqCst means that all
threads must observe all relevant memory operations in the same order,
implying synchronization between all CPUs.

This commit adjusts the implementation in two key ways. First, the
initial load of self.num is now down with Relaxed ordering. If two
threads entered this code simultaneously, formerly, tokio required
that one proceed before the other, negating their parallelism. Now,
either thread may proceed without coordination. Second, the SeqCst
compare_and_swap is changed to a Release, Relaxed
compare_exchange_weak. The first memory ordering referrs to success:
if the value is swapped the load of that value for comparison will be
Relaxed and the store will be Release. The second memory ordering
referrs to failure: if the value is not swapped the load is
Relaxed. The _weak variant may spuriously fail but will generate
better code.

These changes mean that it is possible for more loops to be taken per
call than strictly necessary but with greater parallelism available on
this operation, improved energy consumption as CPUs don't have to
coordinate as much.
tokio/src/time/driver/mod.rs Outdated Show resolved Hide resolved
This commit avoids an additional, unecessary load performed in my last
commit by storing the result of the compare_exchange_weak. This is
done at the cost of one additional loop when curr == MAX_TIMEOUTS but,
considering how expensive loads are, this is the correct thing to do.
@hawkw
hawkw approved these changes Jan 13, 2020
Copy link
Member

hawkw left a comment

This looks good to me, with the caveat that it would be nice to see a loom for this if possible.

@blt

This comment has been minimized.

Copy link
Author

blt commented Jan 13, 2020

@hawkw I'm interested in adding a loom test. I'm not familiar with the layout of tokio's tests. Do you know of a good example I could follow for testing something so internal as this?

@hawkw

This comment has been minimized.

Copy link
Member

hawkw commented Jan 13, 2020

@blt you might want to look at the threadpool's internal loom tests for it's work queue implementation: https://github.com/tokio-rs/tokio/blob/8546ff826db8dba1e39b4119ad909fb6cab2492a/tokio/src/runtime/thread_pool/tests/loom_queue.rs

@blt

This comment has been minimized.

Copy link
Author

blt commented Jan 13, 2020

@hawkw neat! Thanks for the tip. That does look useful.

@blt

This comment has been minimized.

Copy link
Author

blt commented Jan 17, 2020

Hi all, quick update. I'm still working on this, just don't have a lot of spare time through the week this week, apparently.

I'm still learning enough about tokio internals to not be sure
I'm taking the right approach here. This commit introduces a small
test to create an Inner instance -- runnable only with loom flagged
on, but without actually doing loom tests yet -- and assert that
no time has elapsed for the Inner. This fails like so:

```
tokio > RUSTFLAGS="--cfg loom" cargo test time::driver --lib  --features "full" -- --test-threads=1 --nocapture
   Compiling tokio v0.2.9 (/Users/blt/projects/com/github/tokio/tokio)
    Finished test [unoptimized + debuginfo] target(s) in 5.43s
     Running /Users/blt/projects/com/github/tokio/target/debug/deps/tokio-16630467e4eb52a5

running 1 test
test time::driver::tests::test_inner::sanity ... thread 'main' panicked at 'cannot access a scoped thread local variable without calling `set` first', /Users/blt/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:186:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
FAILED

failures:

failures:
    time::driver::tests::test_inner::sanity

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 65 filtered out

error: test failed, to rerun pass '--lib'
```

That it does is a pretty fair indication I don't understand something
vital about tokio internals.
@blt

This comment has been minimized.

Copy link
Author

blt commented Jan 17, 2020

Hi all, to follow on from my last comment, I've hit a little wall. I introduced a small test in 677c829 that creates an Inner and then attempts to assert its elapsed value is 0, just a little thing to demonstrate I can create the types I need to create and run the test. Well, I'm missing some vital understanding as the test blows up in a way I can't quite figure out:

tokio > RUSTFLAGS="--cfg loom" cargo test time::driver --lib  --features "full" -- --test-threads=1 --nocapture
   Compiling tokio v0.2.9 (/Users/blt/projects/com/github/tokio/tokio)
    Finished test [unoptimized + debuginfo] target(s) in 5.43s
     Running /Users/blt/projects/com/github/tokio/target/debug/deps/tokio-16630467e4eb52a5

running 1 test
test time::driver::tests::test_inner::sanity ... thread 'main' panicked at 'cannot access a scoped thread local variable without calling `set` first', /Users/blt/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:186:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
FAILED

failures:

failures:
    time::driver::tests::test_inner::sanity

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 65 filtered out

error: test failed, to rerun pass '--lib'

Any thoughts are appreciated. Further, I'm not actually sure -- once I get this going -- what it is I should test. I had originally thought I'd assert some basic inequality about the value of self.num with two loom threads concurrently calling decrement and increment but I realize that's shot now what with decrement wrapping around. Playing around with methods on Inner, then, is maybe not the right level of abstraction to be modeling? My understanding of tokio internals is still pretty poor, so pointers here would also be appreciated.

let unpark = Box::new(MockUnpark);
let instant = Instant::now();

let inner = Inner::new(instant, unpark);

assert_eq!(inner.elapsed(), 0);
Comment on lines 15 to 20

This comment has been minimized.

Copy link
@hawkw

hawkw Jan 17, 2020

Member

I believe the reason the test is panicking for you is because this code is not inside a call to loom::model, so there's no loom execution running when the atomics are constructed.

If you replace it with

#[test]
fn sanity() {
    loom::model(|| {
        let unpark = Box::new(MockUnpark);
        let instant = Instant::now();

        let inner = Inner::new(instant, unpark);

        assert_eq!(inner.elapsed(), 0);
    })
}

I think it won't panic any longer.

That said, I don't think this really needs to be a loom test, since there's no concurrency involved. This should be pretty much equivalent to a regular Rust test. Though, if your intention was just to start learning how to write loom tests, that makes sense. :)

This comment has been minimized.

Copy link
@blt

blt Jan 17, 2020

Author

I believe the reason the test is panicking for you is because this code is not inside a call to loom::model, so there's no loom execution running when the atomics are constructed.

Hah! D'oh. Makes perfect sense.

That said, I don't think this really needs to be a loom test, since there's no concurrency involved. This should be pretty much equivalent to a regular Rust test. Though, if your intention was just to start learning how to write loom tests, that makes sense. :)

Oh, this for sure does not. The ultimate goal is to demonstrate with a loom test that I didn't obviously bungle the work done to increment. This 'test' was more just to demonstrate to myself that I could run something simple in a loom rig, which now -- thank you! -- I can.

I actually have a good sense of how to proceed from here, so this was very helpful.

This comment has been minimized.

Copy link
@hawkw

hawkw Jan 17, 2020

Member

Okay, great! Glad I could help! :)

blt added 2 commits Jan 22, 2020
This commit introduces a new test-only function into Inner that
returns the value of num. This is used to synchronize two concurrent
loom threads whose sole	goals are to increment and decrement in
equal measure. If the atomics of this changeset	are correct we'll
always hit zero	at the end of the model.

Special	care has to be taken not to decrement unless there's
previously been	an increment. This is the purpose of the
'synchronization' on Inner's num. Note that we always assert at the
end of the model with SeqCst ordering but load in the model relaxed.
I neglected to check the last commit to see if it would build in a
non-loom test environment. It did not.
@blt

This comment has been minimized.

Copy link
Author

blt commented Jan 22, 2020

@udoprog @hawkw thank you both for your reviews. I've introduced a loom test for increment/decrement in fa33129. The test spawns two loom threads and performs an equal number of increment and decrement operations, asserting at the end that an additional observer always sees a consistent zero at the end of the model's run. Because of the debug_assert present in Inner.decrement I had to introduce a slight synchronization in the model: the decrement operation is always done after an increment. My presumption was that this was an invariant of the calling code for Inner -- hence the debug assertion -- but if that's not true then the synchronization in my model is probably not appropriate.

Also, please note that Inner.decrement now operates on Acquire ordering. The load of self.num is always ordered after the Release store done by increment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.