-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add new benchmark for FutureGroup
#179
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for submitting this! - Despite not doing the best in these benchmarks yet, I think they're valuable because they allow us to learn where we have an opportunity to improve.
In principle I'm keen to land these; I mainly just have some concerns about the relatively high amount of outliers at the higher counts. If we can resolve that, I think we should be good!
Thank you!
benches/bench.rs
Outdated
while !senders.is_empty() { | ||
let completion_limit = max_ready_num_per_poll.min(senders.len()); | ||
let num_completing = rng.gen_range(0..=completion_limit); | ||
|
||
assert_eq!(senders.drain(..num_completing).count(), num_completing); | ||
|
||
let recv_start = Instant::now(); | ||
assert_eq!( | ||
(&mut group).take(num_completing).count().await, | ||
num_completing | ||
); | ||
total_runtime += recv_start.elapsed(); | ||
} | ||
} | ||
|
||
total_runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a second to read how this works; it would be really helpful if you can annotate how the latency is being counted here.
From what I can tell, what we're doing is taking N rounds to complete all futures in the group. We count how long it took to complete each round, and add that to a total duration. The way we mark futures as complete is by inserting one-shot channels into the futures group.
Did I get that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We initialize the FutureGroup
with size
futures. Our futures are the Receiver
ends of oneshots. When the Sender
ends are dropped, the corresponding Receiver
futures become ready.
I think that last part may be what you're missing? When we drain the senders
vec at line 155 we are waking up Receiver
s in the FutureGroup
.
Insertion to the FutureGroup
happens once up front and is not used for signaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will get some more comments in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yoshuawuyts let me know if the new docs make this more clear
@soooch By the way, I'd also be interested if you have any theories as to what might be causing this difference in latency? I know for example that in #137 we started dropping futures before we return their values, as a matter of correctness. I don't know whether @soooch Also another question: it seems right now the test is hard-coded to taking 10 rounds. Have you noticed any differences if you increase or decrease the number of rounds? |
my primary theory is that for each poll, all futures are iterated over regardless of whether they have been woken or not. iirc FuturesUnordered does some more complicated stuff there so it only iterates on woken futures. i think i saw an old PR on this repo that was trying to do something similar. i did try out a few other numbers. FutureGroup was always significantly slower than FuturesUnordered, but I think the difference may have been less for higher ratios. (i thought about parameterizing the benches over both size and "wake_limit", but the benches probably already take too long to run). |
Hmmm, thank you for your answer - though that shouldn't be the case. We're being quite careful to only iterate over futures which are currently ready to be woken. We do iterate over all active keys, but that just walks a futures-concurrency/src/future/future_group.rs Lines 363 to 364 in dec5ce0
I'm quite surprised about the differences measured here; especially since on the existing benchmark we also measure time elapsed and the differences are effectively negligable. I'm hoping that by better understanding the way this benchmark is structured we can drill down further into why it performs differently for both APIs. |
The Take a look at the |
I think I've confirmed this theory to myself. This change brings This should still be more algorithmically complex than This performance could definitely be be achieved without depending on bitvec. This is more of a proof of concept. |
ec9e058
to
501689c
Compare
reimplement `FutureGroup::from_iter` in terms of `extend`
Removes restriction that `F: Default`
43d6812
to
fa7c029
Compare
this benchmark aims to measure latency between futures becoming ready and the stream producing them.
fa7c029
to
d960f75
Compare
So it appears that on my machine, I have a significant number of outliers for nearly every bench included in this crate. This new benchmark is about on par with the rest. If you have a more steady bench machine, would you be able to check that this bench is reasonably stable there? |
@yoshuawuyts this may have fallen through the cracks. I believe I've addressed your initial concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops - it indeed did. Thanks again!
This benchmark aims to measure latency between futures becoming ready and the stream producing them.
The bench currently shows
FutureGroup
trailing significantly behindFuturesUnordered
for larger test sizes.