Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51575][PYTHON] Combine Python Data Source pushdown & plan read workers #50340

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wengh
Copy link
Contributor

@wengh wengh commented Mar 20, 2025

Follow up of #49961

What changes were proposed in this pull request?

As pointed out by #49961 (comment), at the time of filter pushdown we already have enough information to also plan read partitions. So this PR changes the filter pushdown worker to also get partitions, reducing the number of exchanges between Python and Scala.

Changes:

  • Extract part of plan_data_source_read.py that is responsible for sending the partitions and the read function to JVM.
  • Use the extracted logic to also send the partitions and read function when doing filter pushdown in data_source_pushdown_filters.py.
  • Update the Scala code accordingly.

Why are the changes needed?

To improve Python Data Source performance when filter pushdown configuration is enabled.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests in test_python_datasource.py

Was this patch authored or co-authored using generative AI tooling?

No

@wengh wengh changed the title [WIP][SPARK-51575][PYTHON] Combine Python Data Source pushdown & plan read workers [SPARK-51575][PYTHON] Combine Python Data Source pushdown & plan read workers Mar 21, 2025
@wengh wengh marked this pull request as ready for review March 21, 2025 00:59
@wengh
Copy link
Contributor Author

wengh commented Mar 21, 2025

@cloud-fan @beliefer @allisonwang-db Please take a look at this follow up of Python Data Source filter pushdown PR #49961

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant