Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further step_or_park implementations [WIP] #268

Merged
merged 8 commits into from May 5, 2019
Merged

Further step_or_park implementations [WIP] #268

merged 8 commits into from May 5, 2019

Conversation

frankmcsherry
Copy link
Member

This PR is a work in progress of await_events methods for communicators which enable the step_or_park functionality. The first commit adds preliminary support for the intra-process allocator (not the serializing one). It has not been tested beyond the standard test suite (which has a few multi-worker tests, but not many).

@frankmcsherry
Copy link
Member Author

Heads up to @comnik @ryzhyk @benesch

The current version seems to work with the examples/barrier.rs using multiple workers, using step_or_park rather than step.

The downside is that it slows things down in this benchmark, from about 6s using two workers to about 9s. Something like this is to be expected, as threads may now be parked where they would otherwise be just about to receive some more work to do, and so get unparked, and .. it wasn't going to go faster for sure.

So, latency has the potential to increase, significantly in tight loops, which may mean that users need to be smart (only park after so many microseconds of no work). We may need to communicate more back to the user (I don't think they can tell if there was no work).

If anyone wants to take this out for a spin and see how it looks, that would be great. There are still certain operators that will keep a worker live and spinning (e.g. replay, and sources that poll without smarter timeouts), but folks could plausibly get timely to chill out when there are no new inputs and outputs have caught up.

@frankmcsherry
Copy link
Member Author

Oops. Except some tests fail now that I actually started calling the code correctly. Await further instructions!

@frankmcsherry
Copy link
Member Author

I believe this is now in a plausibly functional state. I'll look in to adding the other implementations soon; it was surprisingly easy for the MPSC case (modulo bugs we haven't found yet)!

@comnik
Copy link
Member

comnik commented Apr 29, 2019

So David (I can't seem to tag him here) reports that the original single-threaded step_or_park implementation is working for us. He did run into the missing implementation for the generic allocator, but that shouldn't be a problem on this branch. We haven't tried this one yet, but the basic usage model seems to fit our needs.

Thanks a lot for this!

@frankmcsherry
Copy link
Member Author

Yes, that missing implementation meant that for a while default timely wasn't even using any of the new code. That got added and things should be actually parking now. :)

@comnik
Copy link
Member

comnik commented Apr 29, 2019

Ok, we swapped Declarative around to make use of this. We settled on this: https://github.com/comnik/declarative-dataflow/blob/7c344c5496d7b7a1040c3552a661312e1e2f7e96/server/src/main.rs#L712, where we perform a fixed number of non-parking steps, then advance traces, and only after that step_or_park until the next scheduled activation.

The reasoning was that doing step_or_park() in the loop seemed a bit wasteful, and potentially dangerous in terms of blocking network inputs further up the loop. And advancing before the final step_or_park gives us a chance to do some useful work (merging) before going to sleep.

Unfortunately the dependency mess foiled my attempt at running this with this PR included, I'll try that again later.

(edit: but David did confirm, with execute_directly, that things are parking correctly)

@frankmcsherry
Copy link
Member Author

frankmcsherry commented Apr 29, 2019

[...] and potentially dangerous in terms of blocking network inputs further up the loop.

My guess is that the right idiom here will probably be to have any thread calling step_or_park(None) first hand out a copy of its Thread to anyone who wants to point IO at it. If you have a connection handling thread, it should probably be able to wake up the worker thread just after enqueueing something for it to read.

If the worker threads wants to regularly perform network IO itself, then it should probably call either step() or step_or_park(Some(small)).

Any pushback on the appropriateness of the threading model is welcome, of course, but everyone wants to be the one to own blocking and unblocking of threads, so it is hard... :)

@comnik
Copy link
Member

comnik commented Apr 29, 2019

No I think the trade-off is good that way :) Our only risk now is a bit of delay in accepting client inputs (which are not latency critical anyways), I think, and we can always move to a dedicated networking thread.

@comnik
Copy link
Member

comnik commented Apr 30, 2019

Tried again on this branch. With a single process, everything seems to work fine, with significantly reduced cpu usage. However in intra-process configurations, weird things happen.

  1. With n2 w1, the Timely sequencer is suddenly routing inputs incorrectly, s.t. they arrive at the worker whence they came from.

  2. With n>2, w1, all processes panic (once they are all connected), claiming a "Failed to send MergeQueue".

(edit: apart from that, CPU usage is also significantly reduced in cluster configurations, so a partial victory!)

@frankmcsherry
Copy link
Member Author

I've clearly screwed something up then. :) I'll take a peek on Thursday at the earliest, I'm afraid!

@frankmcsherry
Copy link
Member Author

Peek taken, and at least one bug fixed. Small examples seem to work at the moment, in multi-process mode.

@comnik
Copy link
Member

comnik commented May 4, 2019

Things seem to work correctly now :) Tested all configurations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants