hypothesis.errors.Unsatisfiable on Schema.to_schema().example() when SchemaModel has more than 38 fields #838

g-simmons · 2022-04-21T21:09:02Z

Describe the bug
If a SchemaModel contains more than 38 fields, SchemaModel.to_schema().example() throws an error:

hypothesis.errors.Unsatisfiable: Unable to satisfy assumptions of example_generating_inner_function

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandera.
(optional) I have confirmed this bug exists on the master branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

Running the below as-is throws an error.
But runs successfully if field39 and field40 are commented out

import pandera as pa
from pandera.typing import Series

class MyBaseSchema(pa.SchemaModel):
    field1: Series[str] = pa.Field()
    field2: Series[str] = pa.Field()
    field3: Series[str] = pa.Field()
    field4: Series[str] = pa.Field()
    field5: Series[str] = pa.Field()
    field6: Series[str] = pa.Field()
    field7: Series[str] = pa.Field()
    field8: Series[str] = pa.Field()
    field9: Series[str] = pa.Field()
    field10: Series[str] = pa.Field()
    field11: Series[str] = pa.Field()
    field12: Series[str] = pa.Field()
    field13: Series[str] = pa.Field()
    field14: Series[str] = pa.Field()
    field15: Series[str] = pa.Field()
    field16: Series[str] = pa.Field()
    field17: Series[str] = pa.Field()
    field18: Series[str] = pa.Field()
    field19: Series[str] = pa.Field()
    field20: Series[str] = pa.Field()
    field21: Series[str] = pa.Field()
    field22: Series[str] = pa.Field()
    field23: Series[str] = pa.Field()
    field24: Series[str] = pa.Field()
    field25: Series[str] = pa.Field()
    field26: Series[str] = pa.Field()
    field27: Series[str] = pa.Field()
    field28: Series[str] = pa.Field()
    field29: Series[str] = pa.Field()
    field30: Series[str] = pa.Field()
    field31: Series[str] = pa.Field()
    field32: Series[str] = pa.Field()
    field33: Series[str] = pa.Field()
    field34: Series[str] = pa.Field()
    field35: Series[str] = pa.Field()
    field36: Series[str] = pa.Field()
    field37: Series[str] = pa.Field()
    field38: Series[str] = pa.Field()
    field39: Series[str] = pa.Field()
    field40: Series[str] = pa.Field()

if __name__ == "__main__":
    dataframe = MyBaseSchema.to_schema().example(1)
    print(dataframe)

Expected behavior

Don't throw an error, generate an example for the SchemaModel.

Desktop (please complete the following information):

OS: OSX
Python version: 3.9.12
pandera version: 0.10.1
hypothesis version: 6.44.0

The text was updated successfully, but these errors were encountered:

cosmicBboy · 2022-04-22T01:49:38Z

hi @g-simmons, this looks like a performance issue on the dataframe strategy generation function. I suspect it has something to do with this:
https://github.com/pandera-dev/pandera/blob/master/pandera/strategies.py#L1104-L1112

        for col_name, col_dtype in col_dtypes.items():
            if col_dtype in {"object", "str"} or col_dtype.startswith(
                "string"
            ):
                # pylint: disable=cell-var-from-loop,undefined-loop-variable
                strategy = strategy.map(
                    lambda df: df.assign(**{col_name: df[col_name].map(str)})
                )

It would be better to collect the string columns and then apply a list of columns in strategy.map:

col_names = []
for col_name, col_dtype in col_dtypes.items():
    if col_dtype in {"object", "str"} or col_dtype.startswith(
        "string"
    ):
        col_names.append(col_name)

strategy = strategy.map(
    lambda df: df.assign(**{col_name: df[col_name].map(str) for col_name in col_names})
)

I don't have the bandwidth to tackle this right now, but please feel free to make a PR for this! (also adding the "help wanted" tag)

g-simmons · 2022-04-22T19:26:07Z

@cosmicBboy Great, thanks for the input! I also probably don't have bandwidth to work on it now but will come back later if I do. Thanks!

cosmicBboy · 2022-11-03T18:00:03Z

this was fixed by #989

g-simmons added the bug Something isn't working label Apr 21, 2022

cosmicBboy added the help wanted Extra attention is needed label Apr 22, 2022

cosmicBboy closed this as completed Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hypothesis.errors.Unsatisfiable on Schema.to_schema().example() when SchemaModel has more than 38 fields #838

hypothesis.errors.Unsatisfiable on Schema.to_schema().example() when SchemaModel has more than 38 fields #838

g-simmons commented Apr 21, 2022

cosmicBboy commented Apr 22, 2022

g-simmons commented Apr 22, 2022

cosmicBboy commented Nov 3, 2022

hypothesis.errors.Unsatisfiable on Schema.to_schema().example() when SchemaModel has more than 38 fields #838

hypothesis.errors.Unsatisfiable on Schema.to_schema().example() when SchemaModel has more than 38 fields #838

Comments

g-simmons commented Apr 21, 2022

Code Sample, a copy-pastable example

Expected behavior

Desktop (please complete the following information):

cosmicBboy commented Apr 22, 2022

g-simmons commented Apr 22, 2022

cosmicBboy commented Nov 3, 2022