Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support copy-on-write mode for Iceberg write #17272

Open
jackye1995 opened this issue Apr 27, 2023 · 3 comments
Open

Support copy-on-write mode for Iceberg write #17272

jackye1995 opened this issue Apr 27, 2023 · 3 comments
Labels
iceberg Iceberg connector

Comments

@jackye1995
Copy link
Member

Today Iceberg writes only support merge-on-read mode. Copy-on-write mode is a frequent ask for users that want better file layout without the need to run compactions frequently.

Technically this could be achieved pretty easily. The CoW implementation is already available for Delta:

https://github.com/trinodb/trino/blob/master/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java

@jackye1995 jackye1995 added the iceberg Iceberg connector label Apr 27, 2023
@findepi
Copy link
Member

findepi commented May 4, 2023

Yes, I think we should support copy-on-write mode for UPDATE and DELETE for case when we update/delete most of the file. Creating big delta files isn't helpful for anyone.
If we support this strategy, we should automatically choose between copy-on-write and merge-on-read within each IcebergMergeSink independently.

@rdblue
Copy link
Contributor

rdblue commented May 4, 2023

@findepi, there's a table setting for choosing between copy-on-write and merge-on-read: write.(operation).mode where operation can be merge, update, or delete and mode can be copy-on-write or merge-on-read.

@C-h-e-r-r-y
Copy link

@rdblue

For now these settings can not be set during table creation :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iceberg Iceberg connector
Development

No branches or pull requests

4 participants