Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom checks lost after to_yaml #929

Open
3 tasks done
hebrd opened this issue Aug 30, 2022 · 2 comments
Open
3 tasks done

Custom checks lost after to_yaml #929

hebrd opened this issue Aug 30, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@hebrd
Copy link

hebrd commented Aug 30, 2022

Describe the bug
A clear and concise description of what the bug is.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the master branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import pandera as pa
def low_lt_high(df):
    return df['low'] <= df['high']

schema = pa.DataFrameSchema(
    columns={"close": pa.Column(float, checks=[pa.Check.gt(0.0), ])},
    checks=[pa.Check(low_lt_high)]
)
print(schema.to_yaml())

Expected behavior

Keep checks rules in yaml so they can be loaded again.

Desktop (please complete the following information):

  • OS: MacOS
  • Version 0.12.0

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

@hebrd hebrd added the bug Something isn't working label Aug 30, 2022
@kylejcaron
Copy link

Any plans for this? I'm also running into this problem, I have modular schemas that inherit from eachother and want to add another easy option for 3rd party users to get the yaml info from a schema and be able to see everything in 1 place - its a massive convenience to help adoption

Here's a more simple example without inheritance

import pandera as pa
import pandera.extensions as extensions


@extensions.register_check_method(statistics=["cls"])
def non_null_values_in_extra_columns(df, cls):
    """This function checks any column not specified in the schema and makes sure that its not null."""
    # Get the columns defined in the schema
    defined_columns = cls.to_schema().columns.keys()

    # Find columns in the DataFrame that are not defined in the schema
    extra_columns = [col for col in df.columns if col not in defined_columns]

    # Check that all values in these extra columns are not null
    return df[extra_columns].notnull().all().all()



class TestSchema(pa.DataFrameModel):

    @pa.dataframe_check
    def check_non_null_values_in_extra_columns(cls, df):
        res = pa.Check.non_null_values_in_extra_columns(cls)(df)
        return res.check_passed

print(TestSchema.to_yaml())

the expected behavior would be for the registered method non_null_values_in_extra_columns to show up in the yaml output. If I include the check in the Config it will show up in the yaml output, but there would be no way to reference cls that way, and with inheritance I would have to restate all of the Config settings that were inherited or else they would get overwritten (as far as I can tell, theres no way to append to a config apart from the metadata)

@kylejcaron
Copy link

@cosmicBboy hope you don't mind me tagging you, but was wondering if you had any feedback or thoughts on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants