Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix check_input decorator when df passed in kwargs #257

Merged
merged 8 commits into from
Aug 8, 2020

Conversation

vshulyak
Copy link
Contributor

@vshulyak vshulyak commented Aug 6, 2020

When we call a function/method wrapped in @check_input like so:

@check_input(in_schema)
def decorated_function(df):
    return df

decorated_function(df)  # passes
decorated_function(df=df)  # fails

then the decorator fails as it doesn't expect the keyword argument. It's
counterintuitive for the final user of this API.

This patch assumes that the first keyword argument is the dataframe to
validate.

When we call a function/method wrapped in @check_input like so:

    @check_input(in_schema)
    def decorated_function(df):
        return df

    decorated_function(df)  # passes
    decorated_function(df=df)  # fails

then the decorator fails as it doesn't expect the keyword argument. It's
counterintuitive for the final user of this API.

This patch assumes that the first keyword argument is the dataframe to
validate.
@vshulyak
Copy link
Contributor Author

vshulyak commented Aug 6, 2020

There's one thing which bothers me. What should the behavior be like when the user would do:

@check_input(in_schema)
def decorated_function(df, foo):
    return df

decorated_function(foo="bar", df=df)

Should the code fail when there are no args and two or more kwargs are present? Any alternatives to that?

@codecov-commenter
Copy link

codecov-commenter commented Aug 6, 2020

Codecov Report

Merging #257 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #257      +/-   ##
==========================================
+ Coverage   96.55%   96.58%   +0.02%     
==========================================
  Files          15       15              
  Lines        1308     1317       +9     
==========================================
+ Hits         1263     1272       +9     
  Misses         45       45              
Impacted Files Coverage Δ
pandera/decorators.py 92.53% <100.00%> (+1.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e5b555...0812c57. Read the comment docs.

@cosmicBboy
Copy link
Collaborator

cosmicBboy commented Aug 7, 2020

Hi @vshulyak, this is indeed unexpected behavior, thanks for your contribution!

I'm also wondering about

There's one thing which bothers me. What should the behavior be like when the user would do:.

@check_input(in_schema)
def decorated_function(df, foo):
    return df

decorated_function(foo="bar", df=df)

I think maybe a solution to both this and the original problem is to reuse this private function, which gets a list of the argument names of a function in the order specified in the function def:

    @wrapt.decorator
    def _wrapper(...):
        ...
        elif obj_getter is None:
            try:
                if len(args) == 0:
                    # get the first key inthe same order specified in the
                    # function argument.
                    args_names = _get_fn_argnames(fn)
                    kwargs[args_names[0]] = schema.validate(
                        args[args_names[0]], *validate_args
                    )
                else:
                    args[0] = schema.validate(args[0], *validate_args)
            except errors.SchemaError as e:
                msg = (
                    "error in check_input decorator of function '%s': %s" %
                    (fn.__name__, e)
                )
                raise errors.SchemaError(
                    schema, args[0], msg,
                    failure_cases=e.failure_cases,
                    check=e.check,
                    check_index=e.check_index,
                )

This should address the case of decorated_function(foo="bar", df=df) and the original issue decorated_function(df=df). Let me know what you think!

@vshulyak vshulyak closed this Aug 7, 2020
@vshulyak vshulyak reopened this Aug 7, 2020
@vshulyak vshulyak closed this Aug 7, 2020
@vshulyak vshulyak reopened this Aug 7, 2020
@vshulyak
Copy link
Contributor Author

vshulyak commented Aug 7, 2020

Hey @cosmicBboy, thanks for pointing me in the right direction. Using _get_fn_argnames is indeed cleaner.

However, I have two doubts which I tried to address in the current version (pls check the changes!):

  1. Lots of action in the try/except block, which makes the code less readable.
  2. The need to pass the dataframe from either args[0] or kwargs[args_names[0]] to raise errors.SchemaError(..., args[0], ...). If we want to keep the original outer if/elif/else clause, then we'd have to create a mess passing the dataframe to the except block.

In case you have an idea of how to handle the aforementioned issues in a cleaner way, please let me know! For instance, raise errors.SchemaError is raised in two similar spots right now, but it's lesser of two evils imo. What do you think?

P.S. Apologies for closing/opening the issue several times, I had to trigger Travis to restart the ci process.

Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice @vshulyak looks like your changes look cleaner (to keep the conditionals as flat as possible).

One thought would be to abstract away the error-raising bit:

    def handle_schema_error(arg_value, schema_error):
        msg = (
            "error in check_input decorator of function '%s': %s" %
            (fn.__name__, e)
        )
        raise errors.SchemaError(
            schema, args_names[0], msg,
            failure_cases=e.failure_cases,
            check=e.check,
            check_index=e.check_index,
        )

    # call it in the try except blocks
    elif obj_getter is None and args
        try:
            ...
        except SchemaError as e:
            handle_schema_error(args[0], e)

    elif obj_getter is None and kwargs:
        try:
            ...
        except SchemaError as e:
            handle_schema_error(kwargs[arg_names[0]], e)

)

(fn.__name__, e)
)
raise errors.SchemaError(
schema, args_names[0], msg,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be the arg value, so kwargs[args_names[0]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thank you

df = test_func1(df, "foo")
assert isinstance(df, pd.DataFrame)

df, x = test_func2("foo", df)
# call function with a dataframe passed as a keyword argument
df = test_func2(dataframe=df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also test for this case?

@check_input(in_schema)
def decorated_function(df, foo):
    return df

decorated_function(foo="bar", df=df)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, thanks for the hint!

@vshulyak
Copy link
Contributor Author

vshulyak commented Aug 7, 2020

@cosmicBboy thanks for your suggestion. I added a new function _handle_schema_error and feels a bit overengineered. I tried to mimic the pandera code style and add it to the same place as _get_fn_argnames . These functions are really similar in the way they are used, so I thought it makes sense. Putting it inside the decorator felt weird.

So I am 50/50 for adding a new reusable function to handle SchemaError exception. I guess you know better since you started the project. I'm leaving this choice up to you:

  • Commit ec67d57 has tests added as per your code review + bug fix.
  • Next commit 0812c57 introduces _handle_schema_error function.

Please check the code and see which version feels more 'panderic' to you :)

Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent @vshulyak this looks good to me! I agree the _handle_schema_error function may not be the right level of abstraction (indeed it's only used twice) but that can always be re-factored later if it turns out to be the case.

I think this is good to go, thanks for your contribution 🎉 Will merge this in bit

@vshulyak
Copy link
Contributor Author

vshulyak commented Aug 8, 2020

Great! Excited to contribute to this amazing package. Thank you @cosmicBboy for doing all the hard work to maintain it!

@cosmicBboy cosmicBboy merged commit 02d385e into unionai-oss:master Aug 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants