-
-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix check_input decorator when df passed in kwargs #257
Conversation
When we call a function/method wrapped in @check_input like so: @check_input(in_schema) def decorated_function(df): return df decorated_function(df) # passes decorated_function(df=df) # fails then the decorator fails as it doesn't expect the keyword argument. It's counterintuitive for the final user of this API. This patch assumes that the first keyword argument is the dataframe to validate.
There's one thing which bothers me. What should the behavior be like when the user would do:
Should the code fail when there are no args and two or more kwargs are present? Any alternatives to that? |
Codecov Report
@@ Coverage Diff @@
## master #257 +/- ##
==========================================
+ Coverage 96.55% 96.58% +0.02%
==========================================
Files 15 15
Lines 1308 1317 +9
==========================================
+ Hits 1263 1272 +9
Misses 45 45
Continue to review full report at Codecov.
|
Hi @vshulyak, this is indeed unexpected behavior, thanks for your contribution! I'm also wondering about
I think maybe a solution to both this and the original problem is to reuse this private function, which gets a list of the argument names of a function in the order specified in the function def: @wrapt.decorator
def _wrapper(...):
...
elif obj_getter is None:
try:
if len(args) == 0:
# get the first key inthe same order specified in the
# function argument.
args_names = _get_fn_argnames(fn)
kwargs[args_names[0]] = schema.validate(
args[args_names[0]], *validate_args
)
else:
args[0] = schema.validate(args[0], *validate_args)
except errors.SchemaError as e:
msg = (
"error in check_input decorator of function '%s': %s" %
(fn.__name__, e)
)
raise errors.SchemaError(
schema, args[0], msg,
failure_cases=e.failure_cases,
check=e.check,
check_index=e.check_index,
) This should address the case of |
Hey @cosmicBboy, thanks for pointing me in the right direction. Using However, I have two doubts which I tried to address in the current version (pls check the changes!):
In case you have an idea of how to handle the aforementioned issues in a cleaner way, please let me know! For instance, P.S. Apologies for closing/opening the issue several times, I had to trigger Travis to restart the ci process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice @vshulyak looks like your changes look cleaner (to keep the conditionals as flat as possible).
One thought would be to abstract away the error-raising bit:
def handle_schema_error(arg_value, schema_error):
msg = (
"error in check_input decorator of function '%s': %s" %
(fn.__name__, e)
)
raise errors.SchemaError(
schema, args_names[0], msg,
failure_cases=e.failure_cases,
check=e.check,
check_index=e.check_index,
)
# call it in the try except blocks
elif obj_getter is None and args
try:
...
except SchemaError as e:
handle_schema_error(args[0], e)
elif obj_getter is None and kwargs:
try:
...
except SchemaError as e:
handle_schema_error(kwargs[arg_names[0]], e)
)
pandera/decorators.py
Outdated
(fn.__name__, e) | ||
) | ||
raise errors.SchemaError( | ||
schema, args_names[0], msg, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be the arg value, so kwargs[args_names[0]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thank you
tests/test_decorators.py
Outdated
df = test_func1(df, "foo") | ||
assert isinstance(df, pd.DataFrame) | ||
|
||
df, x = test_func2("foo", df) | ||
# call function with a dataframe passed as a keyword argument | ||
df = test_func2(dataframe=df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also test for this case?
@check_input(in_schema)
def decorated_function(df, foo):
return df
decorated_function(foo="bar", df=df)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, thanks for the hint!
@cosmicBboy thanks for your suggestion. I added a new function So I am 50/50 for adding a new reusable function to handle SchemaError exception. I guess you know better since you started the project. I'm leaving this choice up to you:
Please check the code and see which version feels more 'panderic' to you :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent @vshulyak this looks good to me! I agree the _handle_schema_error
function may not be the right level of abstraction (indeed it's only used twice) but that can always be re-factored later if it turns out to be the case.
I think this is good to go, thanks for your contribution 🎉 Will merge this in bit
Great! Excited to contribute to this amazing package. Thank you @cosmicBboy for doing all the hard work to maintain it! |
When we call a function/method wrapped in @check_input like so:
then the decorator fails as it doesn't expect the keyword argument. It's
counterintuitive for the final user of this API.
This patch assumes that the first keyword argument is the dataframe to
validate.