Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use engine flox for ordered groups #266

Merged
merged 34 commits into from
Oct 15, 2023
Merged

Conversation

mathause
Copy link
Contributor

@mathause mathause commented Sep 29, 2023

Set engine=None and select engine="flox" for ordered groups. Some comments

  • this is certainly a very rough WIP
  • I think there could be other logic to determine the optimal engine
  • I only determine the engine in groupby_reduce however it is already accessed in xarray_reduce - this is not optimal (see below)
  • the testing approach is brute force at the moment - I think we'd need to test if the correct engine is choosen (and not just run all the tests)
  • so the engine detection should be done in a separate function and then we can test this function
  • I tested it and this would solve flox performance regression for cftime resampling pydata/xarray#7730 (at least for ordered groupings)

func == "count" and engine != "flox"

Copy link
Collaborator

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

so the engine detection should be done in a separate function and then we can test this function

Yes!

I only determine the engine in groupby_reduce however it is already accessed in xarray_reduce - this is not optimal (see below)

Yes a helper function will fix this. xarray_reduce can jsut call that.

flox/core.py Outdated
@@ -1755,7 +1755,7 @@ def groupby_reduce(
dtype: np.typing.DTypeLike = None,
min_count: int | None = None,
method: T_Method = "map-reduce",
engine: T_Engine = "numpy",
engine: T_Engine = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a docstring update.

flox/core.py Outdated Show resolved Hide resolved
flox/core.py Outdated Show resolved Hide resolved
flox/xarray.py Outdated
@@ -72,7 +72,7 @@ def xarray_reduce(
fill_value=None,
dtype: np.typing.DTypeLike = None,
method: str = "map-reduce",
engine: str = "numpy",
engine: str = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again needs docstring update.

tests/conftest.py Outdated Show resolved Hide resolved
@mathause
Copy link
Contributor Author

mathause commented Oct 6, 2023

Thanks for the feedback. I have to let this rest until I find some time again - so feel free to take this over if anyone is interested.

@dcherian dcherian marked this pull request as ready for review October 14, 2023 04:36
@dcherian dcherian merged commit fecd9a6 into xarray-contrib:main Oct 15, 2023
18 of 19 checks passed
@mathause mathause deleted the engine_none branch October 15, 2023 04:41
@mathause
Copy link
Contributor Author

I hardly qualify to be on here 😅 thanks for finishing up - this will be a huge win!

dcherian added a commit that referenced this pull request Nov 3, 2023
* main: (24 commits)
  Add `packaging` as dependency
  use engine flox for ordered groups (#266)
  Update pyproject.toml: py3.12
  Bump numpy to >=1.22 (#278)
  Cleanups (#276)
  benchmarks updates (#273)
  repo-review comments (#270)
  Significantly faster cohorts detection. (#272)
  Add engine="numbagg" (#72)
  Support quantile, median, mode with method="blockwise". (#269)
  Add multidimensional binning demo (#203)
  [pre-commit.ci] pre-commit autoupdate (#268)
  Drop python 3.8, test python 3.11 (#209)
  tests: move xfail out of functions (#265)
  Bump actions/checkout from 3 to 4 (#267)
  convert datetime: micro-optimizations (#261)
  compatibility with `numpy>=2.0` (#257)
  replace the deprecated `provision-with-micromamba` with `setup-micromamba` (#258)
  Fix some typing errors in asv_bench and tests (#253)
  [pre-commit.ci] pre-commit autoupdate (#250)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants