-
Notifications
You must be signed in to change notification settings - Fork 1
Add additional methods to the Table function #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional methods to the Table function #42
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds several new functionalities to the Table interface, including upsert, delete, rows_affected, schema conversion, and a new to_polars method. The changes enhance schema interoperability between PyIceberg and PyArrow and expand the data manipulation operations.
- Introduces conversion functions between PyArrow and PyIceberg types and expressions.
- Adds methods for upsert, delete, and rows affected tracking.
- Implements schema conversion and a replacement for the deprecated read functionality.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/tower/utils/pyarrow.py | Adds conversion methods and supports expression conversion. |
| src/tower/_tables.py | Implements new Table methods: to_polars, upsert, delete, and rows_affected. |
Comments suppressed due to low confidence (3)
src/tower/utils/pyarrow.py:152
- The function 'convert_expression' is not defined; it appears you intended to use 'convert_pyarrow_expression' for recursive conversion.
return And(convert_expression(expr.args[0]), convert_expression(expr.args[1]))
src/tower/_tables.py:127
- When 'filters' is a list, use 'convert_pyarrow_expressions' instead of 'convert_pyarrow_expression' to correctly convert all expressions.
next_filters = convert_pyarrow_expression(filters)
src/tower/_tables.py:55
- The 'to_polars' method is declared to return a LazyFrame but collects the data into a DataFrame; remove '.collect()' to align the implementation with the declared return type.
return pl.scan_iceberg(self._table).collect()
datancoffee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Two suggestions: don't call collect() in lazy read, and use as_arrow() to get the arrow schema
src/tower/_tables.py
Outdated
| def schema(self) -> pa.Schema: | ||
| # We take an Iceberg Schema and we need to convert it into a PyArrow Schema | ||
| iceberg_schema = self._table.schema() | ||
| return convert_iceberg_schema(iceberg_schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can get the Arrow schema by calling as_arrow()
See https://py.iceberg.apache.org/reference/pyiceberg/schema/#pyiceberg.schema.Schema.as_arrow
Unless you want to implement some special mapping, of course
You can reduce this to:
iceberg_schema = self._table.schema()
return iceberg_schema.as_arrow()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks. Converted this to use PyIceberg's methodology instead of our own.
src/tower/utils/pyarrow.py
Outdated
| raise ValueError(f"Unsupported Arrow type: {arrow_type}") | ||
|
|
||
|
|
||
| def iceberg_to_arrow_type(iceberg_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can get the Arrow schema by calling as_arrow()
See https://py.iceberg.apache.org/reference/pyiceberg/schema/#pyiceberg.schema.Schema.as_arrow
Unless you want to implement some special mapping, of course
You can reduce this to:
iceberg_schema.as_arrow()
src/tower/utils/pyarrow.py
Outdated
| return IcebergSchema(*fields) | ||
|
|
||
|
|
||
| def convert_iceberg_schema(iceberg_schema: IcebergSchema) -> pa.Schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might not need all of this
you can get the Arrow schema by calling as_arrow()
See https://py.iceberg.apache.org/reference/pyiceberg/schema/#pyiceberg.schema.Schema.as_arrow
Unless you want to implement some special mapping, of course
You can reduce this to:
iceberg_schema.as_arrow()
src/tower/_tables.py
Outdated
| Converts the table to a Polars LazyFrame. This is useful when you | ||
| understand Polars and you want to do something more complicated. | ||
| """ | ||
| return pl.scan_iceberg(self._table).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be without the .collect() call?
return pl.scan_iceberg(self._table)
We want to return a LazyFrame, ya?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds several new methods to the Table interface and enhances utility modules to support upsert, delete, schema retrieval, and conversion operations between PyArrow and PyIceberg, along with comprehensive tests.
- Introduces new Table methods such as upsert, delete, rows_affected, schema, and to_polars.
- Enhances the utilities to convert PyArrow expressions into PyIceberg expressions.
- Updates the TableReference and related re-export modules for improved catalog handling and lazy data frame support.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/tower/test_tables.py | Adds tests for reading, writing, upserting, deleting, and schema retrieval on tables. |
| src/tower/utils/pyarrow.py | Introduces functions for converting PyArrow fields, schemas, and expressions to PyIceberg equivalents. |
| src/tower/pyiceberg.py | Provides a dynamic dispatch wrapper for pyiceberg, re-exporting its attributes. |
| src/tower/pyarrow.py | Re-exports pyarrow functionality. |
| src/tower/polars.py | Re-exports polars functionalities for creating LazyFrames. |
| src/tower/_tables.py | Extends the Table interface with new methods and stats, and updates catalog handling in TableReference. |
| left_expr = None # You'd need to extract this | ||
| right_expr = None # You'd need to extract this | ||
| return And( | ||
| convert_pyarrow_expression(left_expr), | ||
| convert_pyarrow_expression(right_expr) | ||
| ) | ||
| elif "or" in expr_str.lower() and isinstance(expr, pc.Expression): | ||
| # Similar simplification | ||
| left_expr = None # You'd need to extract this | ||
| right_expr = None # You'd need to extract this | ||
| return Or( | ||
| convert_pyarrow_expression(left_expr), | ||
| convert_pyarrow_expression(right_expr) | ||
| ) | ||
| elif "not" in expr_str.lower() and isinstance(expr, pc.Expression): | ||
| # Similar simplification | ||
| inner_expr = None # You'd need to extract this |
Copilot
AI
May 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logical operation branch in 'convert_pyarrow_expression' assigns None to sub-expressions which may lead to runtime errors if logical expressions are passed. Consider either raising a NotImplementedError for these cases or implementing proper extraction of the sub-expressions.
| left_expr = None # You'd need to extract this | |
| right_expr = None # You'd need to extract this | |
| return And( | |
| convert_pyarrow_expression(left_expr), | |
| convert_pyarrow_expression(right_expr) | |
| ) | |
| elif "or" in expr_str.lower() and isinstance(expr, pc.Expression): | |
| # Similar simplification | |
| left_expr = None # You'd need to extract this | |
| right_expr = None # You'd need to extract this | |
| return Or( | |
| convert_pyarrow_expression(left_expr), | |
| convert_pyarrow_expression(right_expr) | |
| ) | |
| elif "not" in expr_str.lower() and isinstance(expr, pc.Expression): | |
| # Similar simplification | |
| inner_expr = None # You'd need to extract this | |
| try: | |
| left_expr, right_expr = expr.arguments # Extract sub-expressions | |
| except (AttributeError, ValueError): | |
| raise ValueError(f"Failed to extract sub-expressions from 'and' operation: {expr}") | |
| return And( | |
| convert_pyarrow_expression(left_expr), | |
| convert_pyarrow_expression(right_expr) | |
| ) | |
| elif "or" in expr_str.lower() and isinstance(expr, pc.Expression): | |
| # Similar simplification | |
| try: | |
| left_expr, right_expr = expr.arguments # Extract sub-expressions | |
| except (AttributeError, ValueError): | |
| raise ValueError(f"Failed to extract sub-expressions from 'or' operation: {expr}") | |
| return Or( | |
| convert_pyarrow_expression(left_expr), | |
| convert_pyarrow_expression(right_expr) | |
| ) | |
| elif "not" in expr_str.lower() and isinstance(expr, pc.Expression): | |
| # Similar simplification | |
| try: | |
| (inner_expr,) = expr.arguments # Extract the single sub-expression | |
| except (AttributeError, ValueError): | |
| raise ValueError(f"Failed to extract sub-expression from 'not' operation: {expr}") |
Updated commentary Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This PR introduces a collection of new functions to the Table interface to support common functionality.
upsertallows users to upsert records based on the specifiedjoin_cols(which are columns be names)deleteallows users to delete records. You can supply a set of predicates (filters) that arepyarrow.compute.Expressionfor matching what records to deleterows_affectedwill return information about the rows affected in your operationsschemawill return the PyArrow schemato_polarsreturns apolars.LazyFramethat can be used for creating data frames. This is meant to be a replacement for thereadfunction that will be deprecated.TODO
Tableand friends