improve memory efficiency of validation process #360
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #317, #356
This PR improves overall memory efficiency of
pandera
by minimizingthe number of views/copies being created during the validation routine.
This lowers pandera's memory overhead so that it can handle larger
dataframes.
Another way it does this is by handling nulls in the
Check
class notby dropping rows/elements that are
na
(thus creating copies), but byignoring null entries in the resulting
check_output
.We also limit the number of failure cases captured and reported by
pandera
to thecheck.n_failure_cases
value.