Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly coerce dtypes for columns with regex=True #1602

Merged
merged 5 commits into from
May 6, 2024

Conversation

fkatesslinden
Copy link
Contributor

Fixes #1182. I ran into this bug myself, then found it was previously reported.

See the included tests for a minimal example of the bug: test_config_coerce() passes on main; test_config_coerce_with_regex() fails on main, but passes with this fix.

The change I've submitted here is the minimal change necessary to fix the bug. With this fix, some code is duplicated between the regex and non-regex blocks of the _coerce_dtype_helper() function. I considered separating it into helper functions like _should_coerce() or _override_and_try_coercion(), but there are several ways one could split it up, so I figured reviewers can decide which of those would be preferred.

Also, I wasn't sure which file the tests should go in -- let me know if they should be moved.

@fkatesslinden fkatesslinden force-pushed the bugfix/coerce-regex-columns branch 2 times, most recently from 59d0f45 to 46096bf Compare April 28, 2024 21:16
Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Looks good to me, see comment below

@@ -53,6 +56,35 @@ def test_column_coerce() -> None:
assert Engine.dtype(validated.a.dtype) == Engine.dtype(int)


def test_config_coerce() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One recommendation I have is to convert these into DataFrameSchemas instead of using DataFrameModel because the latter is essentially converted into the former when validation happens, i.e. DataFrameSchema the "true" representation of a schema.

They can live in test_schemas.py if you decide to change this. Also happy to accept these tests using DataFrameModel, in which case they ought to be in test_model.py

Copy link

codecov bot commented May 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.11%. Comparing base (4df61da) to head (2e247b3).
Report is 81 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1602       +/-   ##
===========================================
- Coverage   94.29%   83.11%   -11.18%     
===========================================
  Files          91      116       +25     
  Lines        7024     8536     +1512     
===========================================
+ Hits         6623     7095      +472     
- Misses        401     1441     +1040     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Tess Linden <tess.linden@gmail.com>
Signed-off-by: Tess Linden <tess.linden@gmail.com>
Signed-off-by: Tess Linden <tess.linden@gmail.com>
Signed-off-by: Tess Linden <tess.linden@gmail.com>
Signed-off-by: Tess Linden <tess.linden@gmail.com>
Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @tesslinden 🚀 and congrats on your first PR to pandera 🎉

@cosmicBboy cosmicBboy merged commit 4724036 into unionai-oss:main May 6, 2024
67 of 68 checks passed
@fkatesslinden
Copy link
Contributor Author

thanks @tesslinden 🚀 and congrats on your first PR to pandera 🎉

Awesome! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Config coerce does not override default coerce parameter when using regex
3 participants