-
-
Notifications
You must be signed in to change notification settings - Fork 335
Bugfix/882 coercing twice #901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/882 coercing twice #901
Conversation
acbdb8f
to
7c92917
Compare
@@ -948,6 +948,7 @@ def test_frictionless_schema_parses_correctly(frictionless_schema): | |||
{"check": "not_nullable", "failure_case": "NaN"}, | |||
{"check": "isin({1.0, 2.0, 3.0})", "failure_case": 1.1}, | |||
{"check": "isin({1.0, 2.0, 3.0})", "failure_case": 3.8}, | |||
{"check": "dtype('float64')", "failure_case": "object"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this change for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are not running coerce in schema_components anymore, we run validate_column
in here.
This means wrong values will be passed to validate_column
. This error gets caught here because the dtype is wrong.
It just adds another error to the error lists, which is why we have that additional error.
Agreed that this will improve UX and efficiency! The main concern here is that it introduces additional complexity in the validation routine. Among other things, it would need to:
I think this will be a lot easier to reason about and implement with the overhauled core API: The main idea is that the For example, here's the If you can create another issue for the problem of applying checks to an uncoercible column, we can try fleshing out a solution there, working off of the |
Codecov Report
@@ Coverage Diff @@
## dev #901 +/- ##
==========================================
- Coverage 97.41% 97.38% -0.03%
==========================================
Files 43 43
Lines 4171 4172 +1
==========================================
Hits 4063 4063
- Misses 108 109 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@cosmicBboy Not sure why some of these tests are failing here. I'm not too familiar with Dask and why it prints "2 graph layers" instead of "4 tasks" even though the dataframe is completely the same.
|
6091e3e
to
d2b39d7
Compare
df6b1e2
to
2151741
Compare
* fix datetime strategy * comment out coercing twice part * add float64 error to test * fix formatting * implement fixes * Update strategies.py * fix datetime strategy * comment out coercing twice part * fix formatting * implement fixes * Update strategies.py * ignore mypy Co-authored-by: Niels Bantilan <niels.bantilan@gmail.com>
This fixes the coercing twice bug described in #882 + shows all errors even if coercion fails.
@cosmicBboy About your message here:
I think that even if some values cannot be coerced to the correct type, we can still do downstream checks. For example, unique and nullable checks can still be applied.
This PR starts to implement some of these features and also fixes the twice coercion bug. It applies downstream checks even if coercion failed. A better approach (TODO) is applying downstream checks only for cells that pass coercion. Thus, it's guaranteed that downstream checks receive the correct type.
If you give the go-ahead, then I can start implementing the TODO.