Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/infer-schema: add basic schema inference #202

Merged
merged 11 commits into from
Apr 19, 2020

Conversation

cosmicBboy
Copy link
Collaborator

@cosmicBboy cosmicBboy commented Apr 19, 2020

fixes #163, #93

this PR adds schema inference functionality. The infer_schema function is
now available in the top-level pandera name space, and can be used to
create DataFrameSchema and SeriesSchema objects from DataFrame and
Series inputs, respectively.

The schema_inference module implements logic for inference of dataframe/series
statistics (see infer_*_statistics functions).

Some additional changes in this PR:

  • add additional methods to the PandasDtype enum class in service of schema inference functionality.
  • implement DataFrameSchema.update_column method and SeriesSchema.set_checks
    methods that produce modified copies with updated properties.
  • Add Column.properties attribute.

- dataframe and series schemas can be inferred by providing
  `infer_schema` with the corresponding pandas object.
- extend `PandasDtype` enum class with `from_pandas_api_type`
  and `from_str_alias` methods. Also implemented `__eq__` and
  `__hash__` methods.
- implemented an `_inferred_schema_guard` function that wraps
  dataframe and series schema methods. This should raise a
  UserWarning if a user validates a dataframe/series using
  an inferred schema that hasn't been modified
@codecov-io
Copy link

codecov-io commented Apr 19, 2020

Codecov Report

Merging #202 into master will increase coverage by 0.35%.
The diff coverage is 98.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #202      +/-   ##
==========================================
+ Coverage   96.19%   96.55%   +0.35%     
==========================================
  Files           9       10       +1     
  Lines         710      870     +160     
==========================================
+ Hits          683      840     +157     
- Misses         27       30       +3     
Impacted Files Coverage Δ
pandera/dtypes.py 92.98% <93.33%> (-0.77%) ⬇️
pandera/schemas.py 97.01% <98.55%> (+0.33%) ⬆️
pandera/checks.py 98.44% <100.00%> (+0.01%) ⬆️
pandera/schema_components.py 98.26% <100.00%> (+0.04%) ⬆️
pandera/schema_inference.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 438da34...31cf731. Read the comment docs.

@cosmicBboy cosmicBboy merged commit 46dbe14 into master Apr 19, 2020
@cosmicBboy cosmicBboy deleted the feature/infer-schema branch April 19, 2020 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simple infer dataframe schema
2 participants