New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Basic Dask Support #665
Conversation
Codecov Report
@@ Coverage Diff @@
## release/0.8.0 #665 +/- ##
=================================================
+ Coverage 98.35% 98.40% +0.04%
=================================================
Files 31 32 +1
Lines 3580 3625 +45
=================================================
+ Hits 3521 3567 +46
+ Misses 59 58 -1
Continue to review full report at Codecov.
|
hey @bphillips-exos so the lack of coverage is because we need to add dask to https://github.com/pandera-dev/pandera/blob/release/0.8.0/.github/workflows/ci-tests.yml#L148-L153 |
Thanks @cosmicBboy, the diff is fully covered now. It's unclear to me why the project wide coverage has decreased though, I'll take another look at that. Is there anything else you'd like me to address? |
hi @bphillips-exos no worries, as long as the patch has full coverage it's fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work @bphillips-exos ! 🚀
* first pass of basic Dask support * cleanup docstrings * cleanup after rebase * improve coverage * update CI for new extra * cover branches for dask not installed * more coverage improvements * further coverage improvements
* first pass of basic Dask support * cleanup docstrings * cleanup after rebase * improve coverage * update CI for new extra * cover branches for dask not installed * more coverage improvements * further coverage improvements
Following up on the discussion in #647 and related to #119, this PR adds basic dask support as optional extra functionality. When a pandera schema validates a dask dataframe, it maps the validation operation to each of the partition pandas dataframes.
With dask installed: