Skip to content

Chain primitives in Arroyo #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Conversation

fpacifici
Copy link
Collaborator

Based on #132.

PR #132, introduced parallelism
for transformation operations via Arroyo multi processing in python.

There is a limitation in that model that prevents us from running
sequential transformation steps in the same process. Adding multiple
srequential Map operations and configuring them for multi processing
creates multiple process pools and run each step in parallel in separate
Arroyo strategies.

This PR expands the concept of segment allowing multiple consecutive
transformation steps to be chained together and be executed in the same
processes inside a multi process pool.
This should reduce the overhead and the memory implications of the
previous PR.

  • Introduces TransformerChains as an abstraction to chain together steps.
  • Wirte it up in the Rust Arroyo adapter.
  • Add some documentation for pipeline configuration that explains the
    distribution model.

@fpacifici fpacifici changed the base branch from main to fpacifici/actually_build_parallelism June 14, 2025 00:38
@fpacifici fpacifici changed the base branch from fpacifici/actually_build_parallelism to main June 16, 2025 23:32
@fpacifici fpacifici force-pushed the fpacifici/refactor_segmentation branch from b636e6f to d8de06c Compare June 19, 2025 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants