-
-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inferred schema fails to generate example #988
Comments
hey @mattharrison, is there a particular reason you're generating more and more examples
It'd also be worth documenting the recommendation that generating more than 50 rows of data is a lot to handle to pandera/hypothesis... basically the purpose of this synthetic data is for unit testing, which typically won't involve large datasets. |
Good catch. I changed the
I also changed
to
And it failed to generate an example:
|
I also tried inferring from a sample (100 rows) of the data and got a different error:
|
fixes #988 A change in #658 introduced a step to handle str/object dtypes due to issues handling np.str_. Will need to look into whether that's still necessary in another PR, but this one compresses a bunch of `strategy.map` calls to a single one. This addresses an issue where the strategy would be way too long for schemas with many str/object columns. Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
okay, I identified the promixal issue here: #989 Generating an example on the entire schema works up to 16-ish examples (it craps out at 32): import pandas as pd
import pandera as pa
import time
from datetime import timedelta
from hypothesis import settings
url = 'https://github.com/mattharrison/datasets/blob/master/data/ames-housing-dataset.zip?raw=true'
ames = pd.read_csv(url, compression='zip')
s = pa.infer_schema(ames)
for i in [0, 1, 2, 4, 8, 16, 32]:
start = time.time()
s.example(i)
print(f'{i} examples took {time.time()-start} seconds') Output:
Cutting a bugfix release |
fixes #988 A change in #658 introduced a step to handle str/object dtypes due to issues handling np.str_. Will need to look into whether that's still necessary in another PR, but this one compresses a bunch of `strategy.map` calls to a single one. This addresses an issue where the strategy would be way too long for schemas with many str/object columns. Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Ok, trying this again with another dataset and running into issues. This fails. Note I'm not every creating all of the columns (though I would like to), just two floating point columns. Code:
Should I open another bug? |
Describe the bug
I'm inferring the schema from a CSV with 83 columns. When I try to generate an example it fails.
Unsatisfiable: Unable to satisfy assumptions of hypothesis example_generating_inner_function.
Code Sample, a copy-pastable example
Expected behavior
A clear and concise description of what you expected to happen.
I would expect this to generate an example. I made a simple script to measure timing when adding columns (of int, str, and float) and it works with 80 columns:
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: