Skip to content

Spark 3.4: Backport UPDATE/MERGE logic for row lineage #13344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

geruh
Copy link
Contributor

@geruh geruh commented Jun 18, 2025

This PR back ports the row lineage support in row-level operations from Spark 3.5 that was done in #12736 to Spark 3.4. Which was mostly a 1:1 port from 3.5, however Spark 3.4 uses custom Iceberg rules that aren't exactly the same as the changes that were merged into Spark itself.

Key Differences from 3.5

  • Custom Iceberg Rules in 3.4 (i.e. MergeIntoIcebergTable, UpdateIcebergTable) So we adapt to use these instead
  • Spark 3.4 doesn't have support forWHEN NOT MATCHED BY SOURCE, so we ingore in backport
  • SparkParquetReader didn't have handling for row lineage columns yet

Everything else was a straight port from 3.5 with the logical plan name changes.

cc: @amogh-jahagirdar

@github-actions github-actions bot added the spark label Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant