Skip to content

Fix proximity dask fallback mutating the caller's input chunks#2864

Merged
brendancol merged 1 commit into
mainfrom
issue-2847
Jun 3, 2026
Merged

Fix proximity dask fallback mutating the caller's input chunks#2864
brendancol merged 1 commit into
mainfrom
issue-2847

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2847

_process_dask() did raster.data = raster.data.rechunk(...) on the unbounded GREAT_CIRCLE and no-scipy fallback paths, which rebound .data on the caller's input DataArray. Code that reused the same DataArray afterward saw changed chunking.

Changes

  • Rechunk into a local variable in _process_dask instead of reassigning raster.data, so the input is left untouched.
  • Add regression tests asserting the input .data keeps its identity and chunking after proximity/allocation/direction on both unbounded fallback paths.

Backend coverage

The bug and fix are dask-only. Tests run on dask+numpy and dask+cupy. numpy and cupy paths never reassigned .data and are unaffected.

Test plan

  • New mutation regression tests fail on the old code and pass on the fix
  • Full test_proximity.py suite passes (291 passed)

_process_dask reassigned raster.data = raster.data.rechunk(...) on the
unbounded GREAT_CIRCLE and no-scipy fallback paths, which rebound .data on
the caller's input DataArray. Rechunk into a local variable instead so the
input is left untouched.

Adds regression tests across the dask backends asserting the input .data
keeps its identity and chunking after proximity/allocation/direction on
both unbounded fallback paths.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 2, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Fix proximity dask fallback mutating the caller's input chunks

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • test_proximity.py: on a host without CUDA, the dask+cupy parametrization runs the same code as dask+numpy, because _backend_raster only swaps the buffer to cupy when CUDA is present. So the dask+cupy cases give real coverage only on a GPU runner. That matches the rest of the file and needs no change; just flagging that the dask+cupy assertions are duplicate work off-GPU.

What looks good

  • The fix is minimal and correct. data = raster.data followed by data = data.rechunk(...) leaves the caller's DataArray untouched, since dask .rechunk returns a new array rather than mutating in place. The map_overlap call now reads from the local data.
  • The dask+cupy unbounded path was already safe (it rebinds raster to a .copy() before reaching _process_dask); this fix covers the dask+numpy unbounded path and the converted dask+cupy case without touching that.
  • Tests assert both object identity (raster.data is data_before) and chunking equality, and cover all three public entry points (proximity, allocation, direction) on both unbounded fallback paths. The tests fail on the pre-fix code and pass after.

Checklist

  • Algorithm unchanged; only the chunk-handling side effect was removed.
  • NaN handling unchanged.
  • Dask chunk boundaries handled correctly; rechunk semantics preserved.
  • No premature materialization or extra copies introduced.
  • No README or docstring change needed (pure bug fix, no API change).

@brendancol brendancol merged commit f3dc693 into main Jun 3, 2026
8 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

proximity: dask unbounded fallback mutates the caller's input chunks

1 participant