-
-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add column order validation #352
Conversation
65732ee
to
97fa198
Compare
Codecov Report
@@ Coverage Diff @@
## dev #352 +/- ##
==========================================
+ Coverage 99.02% 99.03% +0.01%
==========================================
Files 21 21
Lines 2351 2383 +32
==========================================
+ Hits 2328 2360 +32
Misses 23 23
Continue to review full report at Codecov.
|
I initially forgot to look into MultiIndex order. Now I've tested adding a
import pandas as pd
df = pd.DataFrame(
index=pd.MultiIndex.from_arrays([[1], [2], [3]], names=["b", "a", "a"]),
)
print(df.index.to_frame())
#> b a
#> b a a
#> 1 2 3 1 3
df = pd.DataFrame(
index=pd.MultiIndex.from_arrays([[1], [2], [3], [4]], names=["a", "b", "c", "a"]),
)
print(df.index.to_frame())
#> a b c
#> a b c a
#> 1 2 3 4 4 2 3 Created on 2020-12-11 by the reprexpy package @cosmicBboy That bug should be addressed before merging this PR (make one test fail). I can take a look over the weekend. Pinging @ktroutman since he seemed interested in that discussion :) |
I just touched this part of the codebase so I took a crack at the multiindex -> df conversion, preserving duplicate index names... tried to emulate the behavior of pandas as much as possible |
Awesome @cosmicBboy 🔥 I fixed a bug with optional columns and added documentation. |
failing test doesn't have to do with the PR, the issues should be fixed by #351, merging this now |
* implement hypothesis strategies for generating synthetic data (#314) * schemas can generate valid samples * implemented basic generation * implement register check strategy * implementations for built-in checks and register check strat * implement column, series, dataframe strategies * implement more tests * implement index/multiindex strategies, at built-in str tests * simplify string strategy tests * fix chained continuous tests * implement nullable strategies * address pylint issues * update environment, setup.py * add docstrings to new PandasDtype methods * null mask is the last strategy in index_strategy * address mypy and black errors * fix legacy pandas issue with nullable ints * skip complex256 tests with windows os * use SUPPORTED_DTYPES to control tested dtypes for os * fix multiindex strategy equality test * bugfix: test index/multiindex strategy type check * add back linux/osx tests * fix str strat tests, move BaseStrat error * improve test coverage * fix series schema pdtype test * add more teset coverage * feature/dataframe-checks (#334) * add support for dataframe check strategies * add support for coerce dtype on dataframe-level pandas_dtype * fix issue with type coercian in multiindex * add packaging to requirements-dev.txt * update travis ci spec with pandera-core env file * increase deadline of in_range strategy test * fix test_in_range_strategy * fix bugs in dataframe check and index tests * update conftest.py: reduce max examples to 100 * fix dataframe strategy * fix type error on windows * improve coverage * bugfix/lazy-error-dtype-coercion (#339) * bugfix: dtype coercion should be cause by lazy validation, update String documentation * improve test coverage * rebase onto dev * feature/str-dtype: introduce PandasDtype.STRING for pandas-native str type (#340) * add PandasDtype.STRING, remove PandasDtype.Str - changing the semantics of PandasDtype.String to map onto the pandas-native 'string' type would have caused backwards compatibility issues - revert changes from 2e2c5d7: remove PandasDtype.Str, and add a PandasDtype.STRING type for the pandas-native string type * fix docs * fix tests for legacy pandas * fix: DataFrameSchema.dtype property * add github action file for ci tests (#349) this diff replaces travis-ci with github actions for unit test CI * bugfix/345: support duplicate columns (#346) * support validating dataframes w/ duplicate columns * add test for multiindex duplicate names * increase coverage * feature/fallback-strategy (#351) * check fallback strategy, register custom checks * fix pylint errors * add test for usage in Field * fix pylint error: hypothesis * 100% coverage extensions module * increase deadline on test strategies * suppress health check on slow running tests * add ci test run on push/PR on dev and master branches (#353) * Add column order validation (#352) * add DataFrameSchema.ordered * fix ordered documentation * add support for MultiIndex.ordered * preserve duplicate indexes when converting multiiindex to df * increase code coverage * fix ordered with optional columns * add documentation of ordered Co-authored-by: cosmicBboy <niels.bantilan@gmail.com> * update ci-badge in README and docs (#355) * update ci badge * add dev and prod build badge * revert * docs: strategies, extensions; bugfix: strategy support object dtype (#354) * add documentation for strategies, check register, extensions * finish up documentation * fix pylint import error * mypy ignore type * update docs * update readme, add schema model to strategy docs Co-authored-by: Jean-Francois Zinque <jzinque@gmail.com>
This PR adds an
ordered
argument toDataFrameSchema
and tomodel.BaseConfig
for the model api. It closes #342.A side effect is that 2 schema errors will be raised if there is a permutation:
Created on 2020-12-11 by the reprexpy package
Note: The PR targets the dev branch.