Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another attempt at abstracting Instant::now #381

Merged
merged 9 commits into from
Jun 6, 2018
Merged

Conversation

carllerche
Copy link
Member

@carllerche carllerche commented May 30, 2018

Currently, the timer uses a Now trait to abstract the source of time.
This allows time to be mocked out. However, the current implementation
has a number of limitations as represented by #288 and #296.

The main issues are that Now requires &mut self which prevents a
value from being easily used in a concurrent environment. Also, when
wanting to write code that is abstract over the source of time, generics
get out of hand.

This patch provides an alternate solution. A new type, Clock is
provided which defaults to Instant::now as the source of time, but
allows configuring the actual source using a new iteration of the Now
trait. This time, Now is Send + Sync + 'static. Internally, Clock
stores the now value in an Arc<Now> value, which introduces dynamism
and allows Clock values to be cloned and be Sync.

Also, the current clock can be set for the current execution context
using the with_default pattern.

Because using the Instant::now will be the most common case by far, it
is special cased in order to avoid the need to allocate an Arc and use
dynamic dispatch.

Remaining

  • Write tests.
  • Write docs.
  • Integrate with the runtime.

Break out issues

  • Once timer::Now is removed, all the misc allow(deprecated) should be removed.
  • The Timer struct should no longer be generic over Now. Instead, it should take a Clock.

Currently, the timer uses a `Now` trait to abstract the source of time.
This allows time to be mocked out. However, the current implementation
has a number of limitations as represented by #288 and #296.

The main issues are that `Now` requires `&mut self` which prevents a
value from being easily used in a concurrent environment. Also, when
wanting to write code that is abstract over the source of time, generics
get out of hand.

This patch provides an alternate solution. A new type, `Clock` is
provided which defaults to `Instant::now` as the source of time, but
allows configuring the actual source using a new iteration of the `Now`
trait. This time, `Now` is `Send + Sync + 'static`. Internally, `Clock`
stores the now value in an `Arc<Now>` value, which introduces dynamism
and allows `Clock` values to be cloned and be `Sync`.

Also, the current clock can be set for the current execution context
using the `with_default` pattern.

Because using the `Instant::now` will be the most common case by far, it
is special cased in order to avoid the need to allocate an `Arc` and use
dynamic dispatch.
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is outstanding! It solves a foundational problem for testing network-oriented programs.


/// TODO: Dox
#[derive(Default, Clone)]
pub struct Clock {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this belong in tokio-timer? What do you think about keeping this in a new tokio-clock or somesuch? I can imagine tokio-dependent code benefiting from this primitive without using timers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly, I would rather avoid introducing a new crate until there is a real case driving it. The two primary reasons to split the crates are that dependencies are different or you want to issue breaking changes at different times. The way I see it:

  • tokio-clock would have the same dependencies as tokio-timer.
  • If in the future, there is a reason to split them, this can be done in a backwards compatible way.

So, I would lean towards keeping it together for now until a solid case comes up.

@carllerche
Copy link
Member Author

One additional reason to abstract the source of time is that doing so allows additional control when doing fuzzing tests. When fuzzing, the return values of now can be tracked, allowing deterministic replay when an issue is found.

@carllerche
Copy link
Member Author

/cc @bluejekyll, we are hoping you will consider using Clock in trust-dns.

@seanmonstar
Copy link
Member

seanmonstar commented Jun 1, 2018

The concept described in the initial comment sounds great! Here's some thoughts about implementation:

  • Switching to a TLS default sounds possibly confusing. I personally preferred it being a generic on Timer.
  • TLS can be kind of slow. Some cases will use the faster native version, but even then, Rust adds overhead to TLS keys by checking a constructor and destructor flag on every access. I've seen a couple percentage points improvement in hyper benchmarks if I change the cached date header from TLS to a field instead.

@bluejekyll
Copy link
Member

Is there a specific area? I haven't been following the development of this feature. I definitely have worked around mocking time in a lot of different areas for testing, I can definitely see this helping.

Though, I still get nervous about global state (thread_local, I know), from abuse in other languages...

@hawkw
Copy link
Member

hawkw commented Jun 1, 2018

@bluejekyll AFAIK, the main place the trust-dns-resolver uses Instant::now is in dns_lru.rs, for generating deadlines from TTLs. This is where I'd like to be able to inject a mocked time source for testing Conduit's code around DNS TTLs, which relies on this code in trust-dns-resolver.

If you're uncomfortable with thread local storage by default, trust-dns-resolver could probably add a feature flag for this and use Instant::now otherwise?

@seanmonstar
Copy link
Member

I mentioned this in Slack, but worth making a public linkable comment:

It'd be useful to try to reduce the amount of times actually calling Instant::now, when we don't need to know the exact now, since platforms like AWS mean getting the time is a syscall always.

We could potentially help that with the API here, since otherwise, we'd need to thread around an Instant manually to prevent calling now again. There's two real intents, I think:

  • What is the current clock time? This is usually needed when comparing elapsed time.
  • What is the recent less-precise time, just to compare with other instants.

@bluejekyll
Copy link
Member

If you're uncomfortable with thread local storage by default, trust-dns-resolver could probably add a feature flag for this and use Instant::now otherwise?

I only mention it as I see a lot of libraries using this approach, and I personally find that it can sometimes make testing harder. If you want to submit a PR, I'll happily take a look.

@olix0r
Copy link
Member

olix0r commented Jun 1, 2018

@carllerche @bluejekyll

If you're uncomfortable with thread local storage by default, trust-dns-resolver could probably add a feature flag for this and use Instant::now otherwise?

Alternatively, trust-dns-resolver could provide its own trait Now type and we could bridge into it from tokio's type to trust's type.

@olix0r
Copy link
Member

olix0r commented Jun 1, 2018

Switching to a TLS default sounds possibly confusing. I personally preferred it being a generic on Timer.

Practically, I found that generics quickly explode: everything that builds a Delay would have to be generic, for instance; and, if we think more broadly, any code that we wish to test that handles time would have to become generic over these types.

TLS can be kind of slow. Some cases will use the faster native version, but even then, Rust adds overhead to TLS keys by checking a constructor and destructor flag on every access. I've seen a couple percentage points improvement in hyper benchmarks if I change the cached date header from TLS to a field instead.

I would definitely be curious to know more about the performance impact of the TLS, especially since time tends to be accessed frequently in the hot path of many applications.

@seanmonstar
Copy link
Member

I found that generics quickly explode: everything that builds a Delay would have to be generic.

Fair point, the generic could be removed, and we just keep the Option<Arc<Now>>, but without it being in TLS.

I would definitely be curious to know more about the performance impact of the TLS, especially since time tends to be accessed frequently in the hot path of many applications.

That was exactly my concern as well.

For TLS implementations, there are fast ones that, if the compiler is building for a specific platform, can optimize well enough so that accesses are just an offset. If we're building more generically, this can inhibit the optimizations the compiler can do. In Rust specifically, here's the safeness checks it performs on every access of the slot.

In hyper, there is a TLS cached formatted date header (since checking the clock and formatter a date is slow). I've experimented with moving it out of TLS and putting it directly in the type that writes the headers, and saw ~1-2% improvement in requests per second.

@carllerche
Copy link
Member Author

Clock has no generics, so it could just take a Clock.

@tailhook
Copy link
Contributor

tailhook commented Jun 1, 2018

I think this thread goes in a wrong direction. While I understand that DnsLru mentioned above indeed can accept Clock as an argument it's quite a niche use case.

The main use case is if I'm writing a protocol implementation, i.e. a HTTP client with a keep-alive timeout, I want to put clock here. Passing a clock by hand isn't going to work here. Because it needs to be at every layer. E.g. in some place I will need Delay::new(...).select(other_future) or something similar. This is to say that passing clock is as infectious as tokio_core::reactor::Handle was.

So either it need to be in futures::task::Context (which is already passed everywhere) or in a thread local.


Another point is if we have evidence that calls to Instant::now are expensive then we can have the following approach:

struct Clock(Arc<NowWithCache>);
struct NowWithCache {
    time_source: Box<Now>,
    cached_value: Atomic<SystemTime>,
}

The idea is that we update cache_value at every tokio loop iteration. And return only cached value on Clock::now(), which does not even involve a virtual memory call.

Then there might be explicit Clock::update() for the cases like tight loops or where precise timings are required (and this is still mockable and predictable).

The precision of main loop iteration time is fine for all network-related time handling and for most other stuff too. But if we need the same clock for the non-IO thread there are approaches for that too. For example, cache rdtsc value along with timestamp and update timestamp if difference is large.

@carllerche
Copy link
Member Author

@tailhook

Passing a clock by hand isn't going to work here. Because it needs to be at every layer.

I agree.

So, given that we are still on futures 0.1, you propose to stick with the clock::now() approach, which looks at the thread-local time source to return an Instant?

Also, your NowWithCache idea is super interesting! We will definitely want to continue exploring that.

@tailhook
Copy link
Contributor

tailhook commented Jun 1, 2018

So, given that we are still on futures 0.1, you propose to stick with the clock::now() approach, which looks at the thread-local time source to return an Instant?

Yes. And Delay and Interval using it internally (this was original intention, right?)

Also, I'm not very familiar with current tokio codebase, but it might make sense to put the actual clock into a loop Handle (perhaps keeping under clock::now facade anyway). So that when checking, it only resolves a single thread-local instead of two vars. But it may break abstractions, be overoptimization, or not worth it for other reasons.

@carllerche
Copy link
Member Author

Yes. And Delay and Interval using it internally (this was original intention, right?)

Right, these types would use clock::now() for time as well.

Also, I'm not very familiar with current tokio codebase, but it might make sense to put the actual clock into a loop Handle (perhaps keeping under clock::now facade anyway). So that when checking, it only resolves a single thread-local instead of two vars. But it may break abstractions, be overoptimization, or not worth it for other reasons.

I'm not really following this. Using clock::now() as proposed should only result in a single TLS lookup.

Since each component is being designed to work standalone, each has an associated TLS, but a single fn call should only ever result in at most one TLS lookup.

If it starts becoming an issue in practice, I'm happy to revisit the design w/ more data.

@carllerche
Copy link
Member Author

I pushed a commit that integrates clock with the runtime.

@carllerche
Copy link
Member Author

Alright all, I think that this PR is good to go. I would appreciate any final thoughts.

Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used this change in tower-balance. I think we can definitely add some better testing utilities to improve ux, but it's usable as it is, so we shouldn't block on that!

@carllerche carllerche mentioned this pull request Jun 5, 2018
4 tasks
@carllerche carllerche merged commit db620b4 into master Jun 6, 2018
@carllerche carllerche deleted the timer-abstract-now branch June 6, 2018 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants