Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/1616: Polars data container validation #1623

Merged
merged 2 commits into from
May 7, 2024

Conversation

cstabnick
Copy link
Contributor

Adds support for polars data container validation in custom dtypes
Adds test demonstrating positive and negative case

Signed-off-by: Connor Stabnick <cstabnick@gmail.com>
Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! See inline comments.

You can also set up pre-commit to make sure these pass:
https://pandera--1618.org.readthedocs.build/en/1618/CONTRIBUTING.html#set-up-pre-commit

def check(
self,
pandera_dtype: DataType,
data_container: Optional[polars_engine.PolarsDataContainer] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type needs to be PolarsData no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, was following how I saw other dtypes take the parameter in.

data_container: Optional[PolarsDataContainer] = None,
) -> Union[bool, Iterable[bool]]:

It looks like when it needs to be treated as PolarsData, it may be wrapped in the class. Think I'm going to handle mypy error similar to
def coerce(self, data_container: PolarsDataContainer) -> pl.LazyFrame:
"""Coerce data container to the data type."""
if isinstance(data_container, pl.LazyFrame):
data_container = PolarsData(data_container)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided not to take this approach due to it just being for a specific test, updated to PolarsData

data_container: Optional[polars_engine.PolarsDataContainer] = None,
) -> Union[bool, Iterable[bool]]:
if key := data_container.key:
ldf = data_container.lazyframe
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy error here should be fixed if data_container type is fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, precommit passes now

pandera_dtype: DataType,
data_container: Optional[polars_engine.PolarsDataContainer] = None,
) -> Union[bool, Iterable[bool]]:
if key := data_container.key:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy error:

tests/polars/test_polars_components.py:206: error: Missing return statement  [return]

due to lack of return statement in the outer if statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, pre commit now passes mypy and pylint

pandera_dtype: DataType,
data_container: Optional[polars_engine.PolarsDataContainer] = None,
) -> Union[bool, Iterable[bool]]:
if key := data_container.key:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key is really only used once in the function body: I'd recommend just using data_container.key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link

codecov bot commented May 7, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 83.09%. Comparing base (4df61da) to head (959a45c).
Report is 84 commits behind head on main.

Files Patch % Lines
pandera/backends/polars/components.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1623       +/-   ##
===========================================
- Coverage   94.29%   83.09%   -11.20%     
===========================================
  Files          91      116       +25     
  Lines        7024     8538     +1514     
===========================================
+ Hits         6623     7095      +472     
- Misses        401     1443     +1042     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Connor Stabnick <cstabnick@gmail.com>
@cstabnick cstabnick requested a review from cosmicBboy May 7, 2024 11:51
Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! codecov error looks like a false positive.

thanks for the contribution, and congrats on your first PR to pandera @cstabnick !

@cosmicBboy cosmicBboy merged commit 30e5f0f into unionai-oss:main May 7, 2024
66 of 68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants