-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Polars materializer #2229
Add Polars materializer #2229
Conversation
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the WalkthroughZenML has expanded its integration capabilities by adding support for Polars, a dataframe library. This update introduces new materializers to handle Polars-specific data types, enabling the reading and writing of Polars data frames within the ZenML environment. The changes reflect the addition of Polars as a recognized integration, as well as the necessary classes and methods to facilitate its use. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
src/zenml/integrations/polars/materializers/dataframe_materializer.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/polars/materializers/dataframe_materializer.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/polars/materializers/dataframe_materializer.py
Outdated
Show resolved
Hide resolved
@coderabbitai review |
…izer.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
…izer.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
…izer.py Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: .coderabbit.yaml
Files selected for processing (6)
- docs/book/user-guide/advanced-guide/data-management/handle-custom-data-types.md (1 hunks)
- src/zenml/integrations/init.py (1 hunks)
- src/zenml/integrations/constants.py (1 hunks)
- src/zenml/integrations/polars/init.py (1 hunks)
- src/zenml/integrations/polars/materializers/init.py (1 hunks)
- src/zenml/integrations/polars/materializers/dataframe_materializer.py (1 hunks)
Files skipped from review due to trivial changes (1)
- src/zenml/integrations/constants.py
Additional comments: 5
src/zenml/integrations/polars/materializers/__init__.py (1)
- 16-18: The import statement is correctly using
# noqa
to bypass linting errors for unused imports, which is standard in__init__.py
files.src/zenml/integrations/polars/__init__.py (1)
- 20-35: The
PolarsIntegration
class is defined correctly with necessary attributes and anactivate
method that imports materializers to ensure they are registered when the integration is activated.src/zenml/integrations/__init__.py (1)
- 49-49: The addition of
PolarsIntegration
to the integrations__init__.py
file is correct and necessary for the integration to be recognized and used within ZenML.src/zenml/integrations/polars/materializers/dataframe_materializer.py (1)
- 30-119: The
PolarsMaterializer
class is correctly implemented with appropriate methods for loading and saving Polars dataframes and series, including the conversion to and from pyarrow tables, and proper cleanup of temporary directories.docs/book/user-guide/advanced-guide/data-management/handle-custom-data-types.md (1)
- 25-25: The documentation has been correctly updated to include the
PolarsMaterializer
in the table of integration-specific materializers, reflecting the new feature added to ZenML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me, pending mypy fixes. The only thing I'd add now would be a test:
https://github.com/zenml-io/zenml/blob/develop/tests/unit/materializers/test_pandas_materializer.py
This gives you an idea of what I reckon would work, though the test you'd write would be located in tests/integration/integrations
. (https://github.com/zenml-io/zenml/blob/develop/scripts/install-zenml-dev.sh will then install it by default as it's not in the list of ignored installations).
Thanks! A test was added in a54e5b4. Also, in c71ef81, ignores were added for the "missing library stubs or py.typed marker" error to ensure that tests pass (as recommended by mypy docs). As a bit of extra information, since Polars dataframes do not support indices as far as I know (supported by a quick Google search), I did not add the Edit: just realized (it's getting time for weekend) that one of the ignores was added at the wrong line. This was corrected in 6455082. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This looks good to go from my side! I think this'll make its way into the release first thing next week, pending @bcdurak's review and tests passing. Thanks as always for the contribution!
@christianversloot thanks for this contribution! It'll be part of the 0.54.0 release coming soon. |
Great! Thanks for the quick reviews. |
Describe changes
This PR adds a basic materializer for Polars (https://pola.rs/), a performant dataframe library inspired by Pandas.
It accepts
pl.DataFrame
andpl.Series
objects (like the Pandas materializer) and writes them to Parquet files via Apache Arrow conversion usingpyarrow
'sParquetWriter
as proposed by Polars docs.For materializing a
pl.Series
object (which Polars converts into apyarrow.array
instead ofpyarrow.table
), a conversion into apyarrow.table
is done first, to allow for writing to Parquet. Without this conversion, theParquetWriter
fails on a lacking table schema. However, to reconvert into apl.Series
on read, a ZenML flag signaling whether the artifact is apl.Series
is added into the table schema before writing.I tested the integration locally:
zenml integration install polars
works and installspolars
andpyarrow
.pl.DataFrame
andpl.Series
inside a step works.Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes
Summary by CodeRabbit
New Features
Documentation
Refactor