Describe the bug
Bounded Dask proximity crashes when the computed halo depth is larger than an axis. _halo_depth() turns max_distance into a pixel search radius using the densest coordinate spacing. On a skinny raster that radius can exceed the raster height or width. It then goes straight into da.map_overlap(depth=(pad_y, pad_x)), which rejects any depth larger than the array along that axis and raises ValueError: The overlapping depth ... is larger than your array ....
Both Dask paths hit this: the dask+numpy path (_process_dask) and the dask+cupy path (_process_dask_cupy).
Reproduce
import numpy as np, xarray as xr, dask.array as da
from xrspatial.proximity import proximity
arr = np.zeros((3, 100), dtype=np.float64)
arr[1, 50] = 1.0
xs = np.linspace(0, 99, 100)
ys = np.linspace(0, 2, 3)
draster = xr.DataArray(da.from_array(arr, chunks=(3, 100)),
coords={'y': ys, 'x': xs}, dims=['y', 'x'])
proximity(draster, x='x', y='y', max_distance=10).data.compute()
# ValueError: The overlapping depth 10 is larger than your array 3.
Here the y axis is length 3 but the halo comes out at 10 pixels.
Expected behavior
A valid raster with a finite max_distance should not crash. Finite max_distance is what we point users at for scalable Dask, so it needs to work on skinny rasters and on great-circle rasters where the pixel pitch changes with latitude. Clamp the overlap depth per axis to something map_overlap accepts (no larger than the chunk size, which is axis_length - 1 for a single chunk) and keep proximity, allocation, and direction results correct.
Impact
High. Finite max_distance is the recommended path for large Dask rasters, and right now it fails on legitimate input.
Affected code
- xrspatial/proximity.py:302 (
_halo_depth returns the unclamped pixel radius)
- xrspatial/proximity.py:1391 (
_process_dask passes the depth to map_overlap)
- xrspatial/proximity.py:589 (
_process_dask_cupy passes the depth to map_overlap)
Describe the bug
Bounded Dask proximity crashes when the computed halo depth is larger than an axis.
_halo_depth()turnsmax_distanceinto a pixel search radius using the densest coordinate spacing. On a skinny raster that radius can exceed the raster height or width. It then goes straight intoda.map_overlap(depth=(pad_y, pad_x)), which rejects any depth larger than the array along that axis and raisesValueError: The overlapping depth ... is larger than your array ....Both Dask paths hit this: the dask+numpy path (
_process_dask) and the dask+cupy path (_process_dask_cupy).Reproduce
Here the y axis is length 3 but the halo comes out at 10 pixels.
Expected behavior
A valid raster with a finite
max_distanceshould not crash. Finitemax_distanceis what we point users at for scalable Dask, so it needs to work on skinny rasters and on great-circle rasters where the pixel pitch changes with latitude. Clamp the overlap depth per axis to somethingmap_overlapaccepts (no larger than the chunk size, which isaxis_length - 1for a single chunk) and keep proximity, allocation, and direction results correct.Impact
High. Finite
max_distanceis the recommended path for large Dask rasters, and right now it fails on legitimate input.Affected code
_halo_depthreturns the unclamped pixel radius)_process_daskpasses the depth tomap_overlap)_process_dask_cupypasses the depth tomap_overlap)