-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core and backend pandera API internals rewrite #913
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Aug 12, 2022
cosmicBboy
force-pushed
the
core-schema
branch
from
August 24, 2022 18:41
e6e42e8
to
0473280
Compare
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
cosmicBboy
changed the title
[wip] core and backend pandera API
core and backend pandera API internals rewrite
Jan 23, 2023
This was referenced Jan 27, 2023
Closed
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fixes #381
Fundamentally, pandera is about defining types for statistical data containers (e.g. pandas
DataFrames
, xarrayDatasets
, SQL tables) that serve to:What
This PR introduces two new subpackages to pandera:
core
: this defines schema specifications for particular families of data containers, e.g. "pandas-like dataframes". This module is responsible for defining the properties held by these data containers.backends
: this defines the underlying implementation of the validation logic given a particular schema specification. This module is responsible for actually verifying those properties for a specific type of data container (e.g. for pandas DataFrames, modin, dask, pyspark.pandas DataFrames, etc.)Why?
The purpose of this PR is to:
xarray.Dataset
s, numpy arrays, tensore objects, etc, all with a focus on:This change will not effect the user-facing API of pandera and will not introduce any breaking changes.
Design Implications
core
schema specification, there may be multiplebackends
that can apply to it. For example, I can define aDataFrameSchema
and, depending on the type of dataframe that I supply toschema.validate
, pandera will delegate to a particular backend.Phases
This PR will be the first phase in a multi-phase approach to improving extensibility:
backends
for the dataframe object: clean up the dataframe validation code by having separate backends formodin
,dask
,pyspark.pandas
,geopandas
(the motivation here is to ensure the backend abstraction makes sense).pandera-contrib
ecosystem: this exists to host other pandera-compliant projects (e.g. https://github.com/carbonplan/xarray-schema) so that the broader community can use pandera's core and backend abstractions to build their own schema types.