Fix check_input decorator when df passed in kwargs #257

vshulyak · 2020-08-06T18:50:55Z

When we call a function/method wrapped in @check_input like so:

@check_input(in_schema)
def decorated_function(df):
    return df

decorated_function(df)  # passes
decorated_function(df=df)  # fails

then the decorator fails as it doesn't expect the keyword argument. It's
counterintuitive for the final user of this API.

This patch assumes that the first keyword argument is the dataframe to
validate.

When we call a function/method wrapped in @check_input like so: @check_input(in_schema) def decorated_function(df): return df decorated_function(df) # passes decorated_function(df=df) # fails then the decorator fails as it doesn't expect the keyword argument. It's counterintuitive for the final user of this API. This patch assumes that the first keyword argument is the dataframe to validate.

vshulyak · 2020-08-06T19:18:45Z

There's one thing which bothers me. What should the behavior be like when the user would do:

@check_input(in_schema)
def decorated_function(df, foo):
    return df

decorated_function(foo="bar", df=df)

Should the code fail when there are no args and two or more kwargs are present? Any alternatives to that?

codecov-commenter · 2020-08-06T20:25:18Z

Codecov Report

Merging #257 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #257      +/-   ##
==========================================
+ Coverage   96.55%   96.58%   +0.02%     
==========================================
  Files          15       15              
  Lines        1308     1317       +9     
==========================================
+ Hits         1263     1272       +9     
  Misses         45       45

Impacted Files	Coverage Δ
pandera/decorators.py	`92.53% <100.00%> (+1.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e5b555...0812c57. Read the comment docs.

cosmicBboy · 2020-08-07T02:41:09Z

Hi @vshulyak, this is indeed unexpected behavior, thanks for your contribution!

I'm also wondering about

There's one thing which bothers me. What should the behavior be like when the user would do:.
@check_input(in_schema)
def decorated_function(df, foo):
    return df
decorated_function(foo="bar", df=df)

I think maybe a solution to both this and the original problem is to reuse this private function, which gets a list of the argument names of a function in the order specified in the function def:

    @wrapt.decorator
    def _wrapper(...):
        ...
        elif obj_getter is None:
            try:
                if len(args) == 0:
                    # get the first key inthe same order specified in the
                    # function argument.
                    args_names = _get_fn_argnames(fn)
                    kwargs[args_names[0]] = schema.validate(
                        args[args_names[0]], *validate_args
                    )
                else:
                    args[0] = schema.validate(args[0], *validate_args)
            except errors.SchemaError as e:
                msg = (
                    "error in check_input decorator of function '%s': %s" %
                    (fn.__name__, e)
                )
                raise errors.SchemaError(
                    schema, args[0], msg,
                    failure_cases=e.failure_cases,
                    check=e.check,
                    check_index=e.check_index,
                )

This should address the case of decorated_function(foo="bar", df=df) and the original issue decorated_function(df=df). Let me know what you think!

vshulyak · 2020-08-07T09:50:13Z

Hey @cosmicBboy, thanks for pointing me in the right direction. Using _get_fn_argnames is indeed cleaner.

However, I have two doubts which I tried to address in the current version (pls check the changes!):

Lots of action in the try/except block, which makes the code less readable.
The need to pass the dataframe from either args[0] or kwargs[args_names[0]] to raise errors.SchemaError(..., args[0], ...). If we want to keep the original outer if/elif/else clause, then we'd have to create a mess passing the dataframe to the except block.

In case you have an idea of how to handle the aforementioned issues in a cleaner way, please let me know! For instance, raise errors.SchemaError is raised in two similar spots right now, but it's lesser of two evils imo. What do you think?

P.S. Apologies for closing/opening the issue several times, I had to trigger Travis to restart the ci process.

cosmicBboy

nice @vshulyak looks like your changes look cleaner (to keep the conditionals as flat as possible).

One thought would be to abstract away the error-raising bit:

    def handle_schema_error(arg_value, schema_error):
        msg = (
            "error in check_input decorator of function '%s': %s" %
            (fn.__name__, e)
        )
        raise errors.SchemaError(
            schema, args_names[0], msg,
            failure_cases=e.failure_cases,
            check=e.check,
            check_index=e.check_index,
        )

    # call it in the try except blocks
    elif obj_getter is None and args
        try:
            ...
        except SchemaError as e:
            handle_schema_error(args[0], e)

    elif obj_getter is None and kwargs:
        try:
            ...
        except SchemaError as e:
            handle_schema_error(kwargs[arg_names[0]], e)

)

cosmicBboy · 2020-08-07T13:16:09Z

pandera/decorators.py

+                    (fn.__name__, e)
+                )
+                raise errors.SchemaError(
+                    schema, args_names[0], msg,


this should be the arg value, so kwargs[args_names[0]]

good catch, thank you

cosmicBboy · 2020-08-07T13:26:09Z

tests/test_decorators.py

    df = test_func1(df, "foo")
    assert isinstance(df, pd.DataFrame)

-    df, x = test_func2("foo", df)
+    # call function with a dataframe passed as a keyword argument
+    df = test_func2(dataframe=df)


should we also test for this case?

@check_input(in_schema) def decorated_function(df, foo): return df decorated_function(foo="bar", df=df)

added, thanks for the hint!

vshulyak · 2020-08-07T19:41:00Z

@cosmicBboy thanks for your suggestion. I added a new function _handle_schema_error and feels a bit overengineered. I tried to mimic the pandera code style and add it to the same place as _get_fn_argnames . These functions are really similar in the way they are used, so I thought it makes sense. Putting it inside the decorator felt weird.

So I am 50/50 for adding a new reusable function to handle SchemaError exception. I guess you know better since you started the project. I'm leaving this choice up to you:

Commit ec67d57 has tests added as per your code review + bug fix.
Next commit 0812c57 introduces _handle_schema_error function.

Please check the code and see which version feels more 'panderic' to you :)

cosmicBboy

excellent @vshulyak this looks good to me! I agree the _handle_schema_error function may not be the right level of abstraction (indeed it's only used twice) but that can always be re-factored later if it turns out to be the case.

I think this is good to go, thanks for your contribution 🎉 Will merge this in bit

vshulyak · 2020-08-08T09:52:51Z

Great! Excited to contribute to this amazing package. Thank you @cosmicBboy for doing all the hard work to maintain it!

Fix check_input to accept only one kwd arg

e2d3988

vshulyak added 2 commits August 7, 2020 09:46

Parse check_input args in the order they defined

2b84735

Fix len(seq) pylint warning

5f1c992

vshulyak closed this Aug 7, 2020

vshulyak reopened this Aug 7, 2020

Change SchemaError handling when using kwargs for df

97c11da

vshulyak closed this Aug 7, 2020

vshulyak reopened this Aug 7, 2020

cosmicBboy reviewed Aug 7, 2020

View reviewed changes

vshulyak added 3 commits August 7, 2020 19:21

Fix wrong argument for SchemaError

ea81c74

Test check_input decorator argument order

ec67d57

Add reusable schema error handler

0812c57

cosmicBboy approved these changes Aug 7, 2020

View reviewed changes

cosmicBboy merged commit 02d385e into unionai-oss:master Aug 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix check_input decorator when df passed in kwargs #257

Fix check_input decorator when df passed in kwargs #257

vshulyak commented Aug 6, 2020

vshulyak commented Aug 6, 2020

codecov-commenter commented Aug 6, 2020 •

edited

Loading

cosmicBboy commented Aug 7, 2020 •

edited

Loading

vshulyak commented Aug 7, 2020

cosmicBboy left a comment

cosmicBboy Aug 7, 2020

vshulyak Aug 7, 2020

cosmicBboy Aug 7, 2020

vshulyak Aug 7, 2020

vshulyak commented Aug 7, 2020

cosmicBboy left a comment

vshulyak commented Aug 8, 2020

Fix check_input decorator when df passed in kwargs #257

Fix check_input decorator when df passed in kwargs #257

Conversation

vshulyak commented Aug 6, 2020

vshulyak commented Aug 6, 2020

codecov-commenter commented Aug 6, 2020 • edited Loading

Codecov Report

cosmicBboy commented Aug 7, 2020 • edited Loading

vshulyak commented Aug 7, 2020

cosmicBboy left a comment

Choose a reason for hiding this comment

cosmicBboy Aug 7, 2020

Choose a reason for hiding this comment

vshulyak Aug 7, 2020

Choose a reason for hiding this comment

cosmicBboy Aug 7, 2020

Choose a reason for hiding this comment

vshulyak Aug 7, 2020

Choose a reason for hiding this comment

vshulyak commented Aug 7, 2020

cosmicBboy left a comment

Choose a reason for hiding this comment

vshulyak commented Aug 8, 2020

codecov-commenter commented Aug 6, 2020 •

edited

Loading

cosmicBboy commented Aug 7, 2020 •

edited

Loading