Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "perfect" waking for impl Merge for Vec (1/2) #50

Merged
merged 7 commits into from
Nov 10, 2022

Conversation

eholk
Copy link
Collaborator

@eholk eholk commented Nov 9, 2022

This changes the implementation of merge for Vec to keep a vector of readiness information for each substream. This means we can avoid polling streams that we know have not been woken up yet.

This also required significantly re-working the benchmark so wakers actually work right. I've tried to keep the new version true to the spirit of the old version.

Performance is currently a mixed bag, but I think there's room to do better. For small vectors, this change makes things much worse. Performance seems to get better as the vector size increases though.

I suspect in real life vectors will mostly be small, so this change is probably not what we want in the current iteration. I suspect we can improve things quite a bit though, such as by:

  1. Fall back on the old behavior for small vectors (we'd need some tuning to know where the right crossover point is).
  2. Use a bitvec to store readiness information. This gives us slightly less memory usage, although I doubt that's actually the limiting factor. The bitvec would probably give us a few more percentage points at the high end.
  3. Use a Deque of indexes for ready tasks instead of a bitvector. This way we don't have to search the whole readiness vector. Again, this will probably help more on the high end and may even hurt on the low end.
  4. Reuse wakers. Right now we allocator a new waker each time we poll a stream. This means lots of allocation and deallocation. Instead, we could keep a Vec<Arc<StreamWaker>> on the Merge struct. This will probably help across the board, but especially on smaller vecs.

Summary of performance changes:

merge 10                time:   [1.1286 µs 1.1541 µs 1.1867 µs]                      
                        change: [+80.107% +82.563% +86.171%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe

merge 100               time:   [23.429 µs 23.445 µs 23.463 µs]                       
                        change: [-6.7512% -6.6315% -6.5143%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Benchmarking merge 1000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60.
merge 1000              time:   [1.2730 ms 1.2739 ms 1.2749 ms]                        
                        change: [-45.764% -45.715% -45.667%] (p = 0.00 < 0.05)
                        Performance has improved.

This keeps a vector of futures that are ready to be polled and only
polls those.

Also adds more testing to make sure the waker logic is correct.
@eholk eholk changed the title wip: more efficient waking for vec stream merge More efficient polling in merging Vecs Nov 9, 2022
@eholk eholk marked this pull request as ready for review November 9, 2022 22:44
@yoshuawuyts yoshuawuyts mentioned this pull request Nov 10, 2022
@yoshuawuyts
Copy link
Owner

yoshuawuyts commented Nov 10, 2022

Merged #51 into this branch. see the patch for details, but Tldr: I've managed to reuse the inline wakers, bringing the perf regression down to ~16%, which seems acceptable. On the upside: we appear to be scaling with O(1) rather than O(log(N)), making wake times far more predictable - and we now also guard against poor futures performance, meaning in certain cases this may actually significantly speed up code.

We should optimize this later, but for now this seems like a really good start and I'd like to merge it!

@yoshuawuyts yoshuawuyts merged commit 84860b4 into main Nov 10, 2022
@delete-merged-branch delete-merged-branch bot deleted the fast-vec-wake branch November 10, 2022 12:13
}

impl Wake for StreamWaker {
fn wake(self: std::sync::Arc<Self>) {
if !self.readiness.lock().unwrap().set_ready(self.id) {
self.parent.wake_by_ref()
let parent = self.parent_waker.as_ref().expect("No parent waker was set");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably okay to silently do nothing if the parent waker is not set.

@yoshuawuyts yoshuawuyts changed the title More efficient polling in merging Vecs Implement "perfect" waking for impl Merge for Vec (1/2) Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants