-
-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/922 add other ways to report unique errors as an argument #914
Conversation
Codecov ReportBase: 96.55% // Head: 96.56% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## dev #914 +/- ##
==========================================
+ Coverage 96.55% 96.56% +0.01%
==========================================
Files 43 43
Lines 4179 4198 +19
==========================================
+ Hits 4035 4054 +19
Misses 144 144
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
thanks @ng-henry ! this will be the last PR before the Please take a look at #913, which is a WIP PR for overhauling pandera internals to be more extensible. I think the other features you'd like to support (like partitioning coercible and uncoercible cells) will be easier to reason about and implement with the new internals #906, although I think there's still a bit of design work to figure out how to implement it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, actually one last thing to do here is to update the docs and docstrings in the relevant areas:
Docstrings
The __init__
args for the following:
DataFrameSchema
,Column
,Index
,MultiIndex
,Field
Docs
@ng-henry after considering this PR a little more, I think conflating the I think it would make more sense to either:
We can merge this as part of the What do you think? |
Yep I think that's a good idea. BTW we are not setting keep = False when using |
I'll just continue work on the report_duplicates argument on this branch, and make a new branch to fix the duplicate issue when using the Column constructor. |
@cosmicBboy codecov says there's zero coverage on |
That's not a problem.
Can we stick to this naming? I think |
@cosmicBboy implemented but codecov is lower. Do you want to add more tests? |
elif unique == "all": | ||
keep_argument = False | ||
else: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @ng-henry do you mind adding a test case for raising this ValueError
?
175abb7
to
d115973
Compare
awesome, thanks @ng-henry ! |
* add unique "keep" setting * add test * add tests + fix pre-commit errors * move UniqueSettings to dtypes.py and fix errors * fix mypy type error * fix mypy type error * a * add keep settings argument * new changes * update documentation * update * redo CI * add ValueError test for invalid unique settings
This implements feature request #902 by allowing the
unique
parameter to take in more arguments, like "first", "last", "all". These control how unique values are reported (all unique values? all except for the first occurrence? all except for the last occurrence?).