Skip to content

Fix sporadic stalling of pipelines #3264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Jun 30, 2023
Merged

Fix sporadic stalling of pipelines #3264

merged 44 commits into from
Jun 30, 2023

Conversation

dominiklohmann
Copy link
Member

@dominiklohmann dominiklohmann commented Jun 26, 2023

This is the third iteration of pipeline execution. We've started with a simple chaining of generators in a single thread just for testing, and moved on to an actor-based model that utilizes CAF's push-based streaming feature for first usage. That was convenient for getting things going, but we quickly realized some disadvantages that we now fix by switching to a message-based approach that is very close to the Volcano model:

  1. Transformation or sink operators whose input is exhausted or source operator no longer sporadically stall. All execution nodes are now guaranteed to yield to the actor scheduler whenever the operator's generator advances.
  2. Back pressure now regulates the number events or bytes rather than number of variable-size batches of events or bytes transferred between operators.
  3. The initialization order of operators is now well-defined: The execution nodes are spawned left-to-right, and the pipeline operators are guaranteed to be initialized right-to-left. Specifically, an operator must yield once before its preceding operator is initialized.
  4. Operators can now await requests if they require a request to be responded to before continuing the pipeline. This works in a scheduler-friendly way.
  5. In the long-term, we want to make use async generators for operators. Plugging that in would now be a relatively easy change to make.

Tasks

Preview Give feedback

@dominiklohmann dominiklohmann added enhancement bug Incorrect behavior labels Jun 26, 2023
This is the third iteration of pipeline execution. We've started with a
simple chaining of generators in a single thread, moved on to an
actor-based model that utilizes a push-based approach, and have since
come to realize a few downsides where we had simply made wrong
assumptions.

This change changes execution nodes to utilize a communication approach
closer to the pull-based Volcano model instead of CAF's push-based
streaming feature. This brings a few advantages:
1. Source operators are no longer polled in a busy loop. Tthis also
   applies to transformation or sink operators whose input is exhausted.
   This sometimes led to a stalling pipeline.
2. Back pressure now regulates the number events or bytes rather than
   number of variable-size batches of events or bytes transferred
   between operators.
3. The initialization order of operators is now well-defined: The
   execution nodes are spawned left-to-right, and the pipeline operators
   are guaranteed to be initialized right-to-left. Specifically, an
   operator must yield once before its preceding operator is
   initialized.
4. Operators can now await requests if they require a request to be
   responded to before continuing the pipeline. This works in a
   scheduler-friendly way.
@dominiklohmann dominiklohmann requested a review from jachris June 29, 2023 15:33
@dominiklohmann dominiklohmann marked this pull request as ready for review June 29, 2023 15:33
Copy link
Contributor

@jachris jachris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We looked over this together. There a few tiny issues that we want to address before merging it, but besides that, this is ready.

@jachris jachris enabled auto-merge June 30, 2023 19:16
@jachris jachris merged commit 9b48c9b into main Jun 30, 2023
@jachris jachris deleted the topic/exec-node-v3 branch June 30, 2023 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants