Fix proximity dask fallback mutating the caller's input chunks#2864
Merged
Conversation
_process_dask reassigned raster.data = raster.data.rechunk(...) on the unbounded GREAT_CIRCLE and no-scipy fallback paths, which rebound .data on the caller's input DataArray. Rechunk into a local variable instead so the input is left untouched. Adds regression tests across the dask backends asserting the input .data keeps its identity and chunking after proximity/allocation/direction on both unbounded fallback paths.
brendancol
commented
Jun 2, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Fix proximity dask fallback mutating the caller's input chunks
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None.
Nits (optional improvements)
- test_proximity.py: on a host without CUDA, the
dask+cupyparametrization runs the same code asdask+numpy, because_backend_rasteronly swaps the buffer to cupy when CUDA is present. So the dask+cupy cases give real coverage only on a GPU runner. That matches the rest of the file and needs no change; just flagging that the dask+cupy assertions are duplicate work off-GPU.
What looks good
- The fix is minimal and correct.
data = raster.datafollowed bydata = data.rechunk(...)leaves the caller's DataArray untouched, since dask.rechunkreturns a new array rather than mutating in place. Themap_overlapcall now reads from the localdata. - The dask+cupy unbounded path was already safe (it rebinds
rasterto a.copy()before reaching_process_dask); this fix covers the dask+numpy unbounded path and the converted dask+cupy case without touching that. - Tests assert both object identity (
raster.data is data_before) and chunking equality, and cover all three public entry points (proximity, allocation, direction) on both unbounded fallback paths. The tests fail on the pre-fix code and pass after.
Checklist
- Algorithm unchanged; only the chunk-handling side effect was removed.
- NaN handling unchanged.
- Dask chunk boundaries handled correctly; rechunk semantics preserved.
- No premature materialization or extra copies introduced.
- No README or docstring change needed (pure bug fix, no API change).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2847
_process_dask()didraster.data = raster.data.rechunk(...)on the unbounded GREAT_CIRCLE and no-scipy fallback paths, which rebound.dataon the caller's input DataArray. Code that reused the same DataArray afterward saw changed chunking.Changes
_process_daskinstead of reassigningraster.data, so the input is left untouched..datakeeps its identity and chunking afterproximity/allocation/directionon both unbounded fallback paths.Backend coverage
The bug and fix are dask-only. Tests run on dask+numpy and dask+cupy. numpy and cupy paths never reassigned
.dataand are unaffected.Test plan
test_proximity.pysuite passes (291 passed)