-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time: lazy init TimerShared in TimerEntry #6512
time: lazy init TimerShared in TimerEntry #6512
Conversation
See tokio-rs#6504 Below are relevant benchmark results of this PR on m1 mac: single_thread_timeout time: [21.869 ns 21.987 ns 22.135 ns] change: [-3.4429% -2.0709% -0.8759%] (p = 0.00 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe multi_thread_timeout-8 time: [4.4835 ns 4.6138 ns 4.7614 ns] change: [-4.3554% +0.1643% +4.5114%] (p = 0.95 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 8 (8.00%) high mild 1 (1.00%) high severe Below are relevant benchmark results of current version on m1 mac: single_thread_timeout time: [40.227 ns 40.416 ns 40.691 ns] change: [+81.321% +82.817% +84.121%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) high mild 11 (11.00%) high severe multi_thread_timeout-8 time: [183.16 ns 186.02 ns 188.21 ns] change: [+3765.0% +3880.4% +3987.4%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) low severe 6 (6.00%) low mild
7f0c3f3
to
bde0742
Compare
Below are relevant benchmark results of this PR on Linux AMD64:
Below are relevant benchmark results of the current version on Linux AMD64:
|
Should we lazily initialize the |
Lazily initializing the |
I think so. After some time, I find there are many custom |
tokio/src/time/sleep.rs
Outdated
pub struct Sleep { | ||
inner: Inner, | ||
|
||
deadline : Instant, | ||
handle: scheduler::Handle, | ||
// The link between the `Sleep` instance and the timer that drives it. | ||
#[pin] | ||
entry: TimerEntry, | ||
entry: Option<TimerEntry>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this change the size of Sleep
? Could TimerEntry
be changed to reduce the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be possible. Your idea looks better, I'll give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put deadline
and handle
in Sleep
here for do not init the TimerEntry
, but I have to put them to a place to save them temporarily.
Saving handle
here is to ensure the timeout
will panic if there is no runtime handle in TLS or we do not enable_time
or enable_all
.
The size of Sleep
is an issue worth considering. Is there any way to solve this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you just don't call clear_entry
in this method when it hasn't yet been registered?
Based on @conradludgate's comment here, it sounds like this triggers a loom failure, but that's most likely due to some path where the timer is dropped concurrently with firing or something like that. If we've never been registered with the driver, then not calling clear_entry
should be okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my test, it can improve the performance by reduicng lock contention when we always let timeout register into the driver. But, on the other hand, if the timeout never register into the driver, this can not bring significant performance improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, maybe we can use inner: StdUnsafeCell<Option<TimerShared>>
for this purpose, the size of Sleep
should be barely changed.
Avoiding clear_entry
for unregistered timeout
s is not necessary. In this case, we should avoid constructing the intrusive linked list item.
Thank you for doing that benchmark. It sounds like this particular change is still a good idea, but we should reconsider the other changes mentioned on the original issue. That said, I wonder what we would get on a benchmark where the timer always gets registered with the driver. |
Forgive me. I saw it wrong :) |
Of course, if all timers will be registered in the Driver, and the concurrency is very high. Then lock contention will become the most important concern. The following flame shows the global lock ops take fn main() {
let runtime = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.worker_threads(8)
.build()
.unwrap();
let _r = runtime.block_on(async {
let mut handles = Vec::with_capacity(1024);
for _ in 0..1024 {
handles.push(tokio::spawn(async move {
loop {
for i in 1..10 {
for _ in 0..100 {
let h = timeout(Duration::from_millis(i), never_ready_job2(i));
let _r = h.await;
}
}
}
}));
}
for handle in handles {
handle.await.unwrap();
}
});
}
pub fn fibonacci(n: u64) -> u64 {
match n {
0 => 1,
1 => 1,
n => fibonacci(n - 1) + fibonacci(n - 2),
}
}
// a job never ready
async fn never_ready_job2(n : u64) -> u64 {
let res = fibonacci(n+5); // do some logic
pending::<()>().await;
res
} The global lock in multi-threads is usually not good enough in performance. So, going back to the issue itself, both methods may need to be done to reduce the performance overhead of timeout.
|
By lazily initializing the The results of this PR:
The results of master:
To achieve all of this, our price is that the byte size of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently I didn't actually submit this comment. It's been "pending" for 5 days. Sorry about that.
tokio/src/runtime/time/entry.rs
Outdated
fn inner(&self) -> &TimerShared { | ||
unsafe { &*self.inner.get() } | ||
unsafe { &mut *self.inner.get() }.get_or_insert(TimerShared::new()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this to only create a mutable reference if it is None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have used additional is_done
code to achieve this goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
This commit is part of reducing timeout performance overhead.
See #6504
Below are relevant benchmark results of this PR on m1:
Below are relevant benchmark results of the current version on m1: