Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Basic Dask Support #665

Merged
merged 8 commits into from Oct 28, 2021
Merged

Add Basic Dask Support #665

merged 8 commits into from Oct 28, 2021

Conversation

ghost
Copy link

@ghost ghost commented Oct 24, 2021

Following up on the discussion in #647 and related to #119, this PR adds basic dask support as optional extra functionality. When a pandera schema validates a dask dataframe, it maps the validation operation to each of the partition pandas dataframes.

With dask installed:

import pandera as pa
from pandera.typing import DaskDataFrame


class MySchema(pa.SchemaModel):
    col: Series[int]


@pa.check_types
def f(df: DaskDataFrame[MySchema]):
    ...

@ghost ghost mentioned this pull request Oct 24, 2021
@codecov
Copy link

codecov bot commented Oct 24, 2021

Codecov Report

Merging #665 (de89c9d) into release/0.8.0 (0236dae) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                Coverage Diff                @@
##           release/0.8.0     #665      +/-   ##
=================================================
+ Coverage          98.35%   98.40%   +0.04%     
=================================================
  Files                 31       32       +1     
  Lines               3580     3625      +45     
=================================================
+ Hits                3521     3567      +46     
+ Misses                59       58       -1     
Impacted Files Coverage Δ
pandera/__init__.py 100.00% <100.00%> (ø)
pandera/check_utils.py 100.00% <100.00%> (ø)
pandera/dask_accessor.py 100.00% <100.00%> (ø)
pandera/schemas.py 99.52% <100.00%> (+0.01%) ⬆️
pandera/typing.py 93.39% <100.00%> (+0.76%) ⬆️
pandera/pandas_accessor.py 100.00% <0.00%> (+3.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0236dae...de89c9d. Read the comment docs.

@cosmicBboy
Copy link
Collaborator

hey @bphillips-exos so the lack of coverage is because we need to add dask to ci-test.yml like so:

https://github.com/pandera-dev/pandera/blob/release/0.8.0/.github/workflows/ci-tests.yml#L148-L153

@ghost
Copy link
Author

ghost commented Oct 27, 2021

Thanks @cosmicBboy, the diff is fully covered now. It's unclear to me why the project wide coverage has decreased though, I'll take another look at that. Is there anything else you'd like me to address?

@cosmicBboy
Copy link
Collaborator

hi @bphillips-exos no worries, as long as the patch has full coverage it's fine.

Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @bphillips-exos ! 🚀

@cosmicBboy cosmicBboy merged commit 2046b36 into unionai-oss:release/0.8.0 Oct 28, 2021
@ghost ghost deleted the release/0.8.0 branch October 28, 2021 12:38
cosmicBboy pushed a commit that referenced this pull request Nov 11, 2021
* first pass of basic Dask support

* cleanup docstrings

* cleanup after rebase

* improve coverage

* update CI for new extra

* cover branches for dask not installed

* more coverage improvements

* further coverage improvements
cosmicBboy pushed a commit that referenced this pull request Nov 11, 2021
* first pass of basic Dask support

* cleanup docstrings

* cleanup after rebase

* improve coverage

* update CI for new extra

* cover branches for dask not installed

* more coverage improvements

* further coverage improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant