Skip to content

Commit

Permalink
make df strategy less complex (#989)
Browse files Browse the repository at this point in the history
fixes #988

A change in #658 introduced a step to handle str/object dtypes due
to issues handling np.str_. Will need to look into whether that's
still necessary in another PR, but this one compresses a bunch
of `strategy.map` calls to a single one.

This addresses an issue where the strategy would be way too long
for schemas with many str/object columns.

Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>

Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
  • Loading branch information
cosmicBboy committed Oct 28, 2022
1 parent 38881b6 commit 6c79dc9
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions pandera/strategies.py
Original file line number Diff line number Diff line change
Expand Up @@ -1103,14 +1103,23 @@ def _dataframe_strategy(draw):
)

# this is a hack to convert np.str_ data values into native python str.
string_columns = []
for col_name, col_dtype in col_dtypes.items():
if col_dtype in {"object", "str"} or col_dtype.startswith(
"string"
):
# pylint: disable=cell-var-from-loop,undefined-loop-variable
strategy = strategy.map(
lambda df: df.assign(**{col_name: df[col_name].map(str)})
string_columns.append(col_name)

if string_columns:
# pylint: disable=cell-var-from-loop,undefined-loop-variable
strategy = strategy.map(
lambda df: df.assign(
**{
col_name: df[col_name].map(str)
for col_name in string_columns
}
)
)

strategy = strategy.map(
lambda df: df if df.empty else df.astype(col_dtypes)
Expand Down

0 comments on commit 6c79dc9

Please sign in to comment.