-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Implement to_iceberg #61507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH: Implement to_iceberg #61507
Conversation
""" | ||
Write a DataFrame to an Apache Iceberg table. | ||
|
||
.. versionadded:: 3.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an experimental tag to this API as well like we did with read_iceberg
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I forgot about that. Added it now. I also expanded the user guide docs of iceberg with to_iceberg
, which I also had forgotten. Thanks for the review and the feedback!
*, | ||
catalog_properties: dict[str, Any] | None = None, | ||
location: str | None = None, | ||
snapshot_properties: dict[str, str] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on adding append to match to_parquet? Something like
append: bool = False
Then this could default to table.overwrite instead of append. I think it might be confusing if this doesn't match other to_* functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does PyIceberg support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the table.overwrite method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I didn't think about it. I'll add it, thanks for the feedback.
identifier=table_identifier, | ||
schema=arrow_table.schema, | ||
location=location, | ||
# we could add `partition_spec`, `sort_order` and `properties` in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think these would be great to have but I don't really have any ideas on how to do it without just using PyIceberg objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding them later is easy if we think of a good signature. That's why I didn't worry too much about adding them.
Added the I was thinking that for the parameters that receive PyIceberg objects, one option is to use a generic |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.