Add Basic Dask Support #665

ghost · 2021-10-24T14:01:10Z

Following up on the discussion in #647 and related to #119, this PR adds basic dask support as optional extra functionality. When a pandera schema validates a dask dataframe, it maps the validation operation to each of the partition pandas dataframes.

With dask installed:

import pandera as pa
from pandera.typing import DaskDataFrame


class MySchema(pa.SchemaModel):
    col: Series[int]


@pa.check_types
def f(df: DaskDataFrame[MySchema]):
    ...

codecov · 2021-10-24T16:11:38Z

Codecov Report

Merging #665 (de89c9d) into release/0.8.0 (0236dae) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@                Coverage Diff                @@
##           release/0.8.0     #665      +/-   ##
=================================================
+ Coverage          98.35%   98.40%   +0.04%     
=================================================
  Files                 31       32       +1     
  Lines               3580     3625      +45     
=================================================
+ Hits                3521     3567      +46     
+ Misses                59       58       -1

Impacted Files	Coverage Δ
pandera/__init__.py	`100.00% <100.00%> (ø)`
pandera/check_utils.py	`100.00% <100.00%> (ø)`
pandera/dask_accessor.py	`100.00% <100.00%> (ø)`
pandera/schemas.py	`99.52% <100.00%> (+0.01%)`	⬆️
pandera/typing.py	`93.39% <100.00%> (+0.76%)`	⬆️
pandera/pandas_accessor.py	`100.00% <0.00%> (+3.44%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0236dae...de89c9d. Read the comment docs.

cosmicBboy · 2021-10-26T23:46:19Z

hey @bphillips-exos so the lack of coverage is because we need to add dask to ci-test.yml like so:

https://github.com/pandera-dev/pandera/blob/release/0.8.0/.github/workflows/ci-tests.yml#L148-L153

ghost · 2021-10-27T12:58:44Z

Thanks @cosmicBboy, the diff is fully covered now. It's unclear to me why the project wide coverage has decreased though, I'll take another look at that. Is there anything else you'd like me to address?

cosmicBboy · 2021-10-27T14:27:24Z

hi @bphillips-exos no worries, as long as the patch has full coverage it's fine.

cosmicBboy

great work @bphillips-exos ! 🚀

* first pass of basic Dask support * cleanup docstrings * cleanup after rebase * improve coverage * update CI for new extra * cover branches for dask not installed * more coverage improvements * further coverage improvements

Brian Phillips added 3 commits October 24, 2021 09:13

first pass of basic Dask support

6aaa023

cleanup docstrings

798eff1

cleanup after rebase

37f32df

ghost mentioned this pull request Oct 24, 2021

Add Basic Dask Support #662

Closed

improve coverage

4e4d1b5

Brian Phillips added 3 commits October 26, 2021 19:57

update CI for new extra

334cdd9

cover branches for dask not installed

2513373

more coverage improvements

587bd84

further coverage improvements

de89c9d

cosmicBboy approved these changes Oct 28, 2021

View reviewed changes

cosmicBboy merged commit 2046b36 into unionai-oss:release/0.8.0 Oct 28, 2021

ghost deleted the release/0.8.0 branch October 28, 2021 12:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Basic Dask Support #665

Add Basic Dask Support #665

ghost commented Oct 24, 2021

codecov bot commented Oct 24, 2021 •

edited

cosmicBboy commented Oct 26, 2021

ghost commented Oct 27, 2021

cosmicBboy commented Oct 27, 2021

cosmicBboy left a comment

Add Basic Dask Support #665

Add Basic Dask Support #665

Conversation

ghost commented Oct 24, 2021

codecov bot commented Oct 24, 2021 • edited

Codecov Report

cosmicBboy commented Oct 26, 2021

ghost commented Oct 27, 2021

cosmicBboy commented Oct 27, 2021

cosmicBboy left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 24, 2021 •

edited