Skip to content

Implement a distributed pipeline executor #3119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
May 6, 2023
Merged

Conversation

dominiklohmann
Copy link
Member

This PR changes vast exec to support executing parts of a pipeline remotely, and introduces the concept of an operator location, and the remote and local operator modifiers.

The changes in this PR should have little user-visible effect, except for the new operator modifiers, but will allow future operators to run remotely and as such unblocks the development of many planned operators like export and import.

@dominiklohmann dominiklohmann added the feature New functionality label May 4, 2023
@dominiklohmann dominiklohmann requested a review from jachris May 4, 2023 13:37
@dominiklohmann dominiklohmann force-pushed the topic/actor-executor branch 2 times, most recently from 038e154 to 9ed6b5c Compare May 4, 2023 13:54
@dominiklohmann dominiklohmann force-pushed the topic/actor-executor branch from 9ed6b5c to f67f483 Compare May 4, 2023 13:55
@dominiklohmann dominiklohmann enabled auto-merge May 4, 2023 14:05
The function is no longer necessary in the current design, so we're
removing it for now.
After a lot of experimenting and researching throughout the CAF code
base, it turned out that the only reliable way to shut down a stream
spanning actors in multiple processes is to send a sentinel value within
the stream itself.

Now, you may wonder, why don't the other approaches work? Here's why:

1. Make the stream own the execution nodes: This works wonderfully up to
   the point where an execution node is in a different process, and its
   reference count never goes to zero when the part of the stream hosted
   by that execution node goes down.

2. Close and flush the stream explicitly: This causes a race condition
   in the stream drivers finalize handler, which makes it impossible to
   send out further events from a stream stage to a stream sink after
   the stream has been closed.

3. Shut down the hosting execution node actor from within the stream:
   This causes another race condition where it the stream finalize
   handler may be called after the actor state was destroyed, causing
   address sanitizer to go boom.

So yeah, in-stream sentinel values it is. Works wonderfully.
@dominiklohmann dominiklohmann merged commit 349f462 into main May 6, 2023
@dominiklohmann dominiklohmann deleted the topic/actor-executor branch May 6, 2023 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants