Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hypothesis.errors.Unsatisfiable on Schema.to_schema().example() when SchemaModel has more than 38 fields #838

Closed
2 of 3 tasks
g-simmons opened this issue Apr 21, 2022 · 3 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@g-simmons
Copy link

Describe the bug
If a SchemaModel contains more than 38 fields, SchemaModel.to_schema().example() throws an error:

hypothesis.errors.Unsatisfiable: Unable to satisfy assumptions of example_generating_inner_function
  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the master branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

  • Running the below as-is throws an error.
  • But runs successfully if field39 and field40 are commented out
import pandera as pa
from pandera.typing import Series

class MyBaseSchema(pa.SchemaModel):
    field1: Series[str] = pa.Field()
    field2: Series[str] = pa.Field()
    field3: Series[str] = pa.Field()
    field4: Series[str] = pa.Field()
    field5: Series[str] = pa.Field()
    field6: Series[str] = pa.Field()
    field7: Series[str] = pa.Field()
    field8: Series[str] = pa.Field()
    field9: Series[str] = pa.Field()
    field10: Series[str] = pa.Field()
    field11: Series[str] = pa.Field()
    field12: Series[str] = pa.Field()
    field13: Series[str] = pa.Field()
    field14: Series[str] = pa.Field()
    field15: Series[str] = pa.Field()
    field16: Series[str] = pa.Field()
    field17: Series[str] = pa.Field()
    field18: Series[str] = pa.Field()
    field19: Series[str] = pa.Field()
    field20: Series[str] = pa.Field()
    field21: Series[str] = pa.Field()
    field22: Series[str] = pa.Field()
    field23: Series[str] = pa.Field()
    field24: Series[str] = pa.Field()
    field25: Series[str] = pa.Field()
    field26: Series[str] = pa.Field()
    field27: Series[str] = pa.Field()
    field28: Series[str] = pa.Field()
    field29: Series[str] = pa.Field()
    field30: Series[str] = pa.Field()
    field31: Series[str] = pa.Field()
    field32: Series[str] = pa.Field()
    field33: Series[str] = pa.Field()
    field34: Series[str] = pa.Field()
    field35: Series[str] = pa.Field()
    field36: Series[str] = pa.Field()
    field37: Series[str] = pa.Field()
    field38: Series[str] = pa.Field()
    field39: Series[str] = pa.Field()
    field40: Series[str] = pa.Field()

if __name__ == "__main__":
    dataframe = MyBaseSchema.to_schema().example(1)
    print(dataframe)

Expected behavior

Don't throw an error, generate an example for the SchemaModel.

Desktop (please complete the following information):

  • OS: OSX
  • Python version: 3.9.12
  • pandera version: 0.10.1
  • hypothesis version: 6.44.0
@g-simmons g-simmons added the bug Something isn't working label Apr 21, 2022
@cosmicBboy
Copy link
Collaborator

hi @g-simmons, this looks like a performance issue on the dataframe strategy generation function. I suspect it has something to do with this:
https://github.com/pandera-dev/pandera/blob/master/pandera/strategies.py#L1104-L1112

        for col_name, col_dtype in col_dtypes.items():
            if col_dtype in {"object", "str"} or col_dtype.startswith(
                "string"
            ):
                # pylint: disable=cell-var-from-loop,undefined-loop-variable
                strategy = strategy.map(
                    lambda df: df.assign(**{col_name: df[col_name].map(str)})
                )

It would be better to collect the string columns and then apply a list of columns in strategy.map:

col_names = []
for col_name, col_dtype in col_dtypes.items():
    if col_dtype in {"object", "str"} or col_dtype.startswith(
        "string"
    ):
        col_names.append(col_name)

strategy = strategy.map(
    lambda df: df.assign(**{col_name: df[col_name].map(str) for col_name in col_names})
)

I don't have the bandwidth to tackle this right now, but please feel free to make a PR for this! (also adding the "help wanted" tag)

@cosmicBboy cosmicBboy added the help wanted Extra attention is needed label Apr 22, 2022
@g-simmons
Copy link
Author

@cosmicBboy Great, thanks for the input! I also probably don't have bandwidth to work on it now but will come back later if I do. Thanks!

@cosmicBboy
Copy link
Collaborator

this was fixed by #989

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants