Skip to content

Fix dask aggregate boundary contamination and clean up cumulative bookkeeping #1469

@brendancol

Description

@brendancol

Describe the bug

Two related problems in xrspatial/resample.py's dask aggregate paths (_run_dask_numpy and _run_dask_cupy):

  1. Boundary contamination. The aggregate dask path calls dask.array.overlap.overlap with boundary='nearest'. At the global edge of the input array, the overlap pad is filled with duplicated edge cells. Output pixels whose aggregate window straddles that edge then sample those duplicates, which biases min/max/median. Mean is less affected because the duplicates are real values, but they're still triple-counted near corners.

  2. Wasted/inconsistent cumulative bookkeeping. The aggregate path computes global_in_h, cum_in_y, cum_in_x, out_y, out_x, cum_out_y, cum_out_x once before _ensure_min_chunksize may rechunk for the depth requirement, then conditionally recomputes when the rechunk changed the layout. The first compute is wasted, and the conditional recompute uses data.chunks[0] != tuple(cum_in_y[1:] - cum_in_y[:-1]) as a roundabout chunk-equality check.

Expected behavior

Aggregate dask results should match eager numpy bit-identically for min/max/median (same kernel, no boundary padding bias). The bookkeeping should compute once.

Fix

  • Use boundary=np.nan on the aggregate overlap. The aggregate kernels already skip NaN via if not np.isnan(v) and return NaN for empty windows, so padded NaN cells are ignored naturally.
  • Compute min_size from the scale-driven minimum and the depth-driven max(2*depth_y+1, 2*depth_x+1) up front, call _ensure_min_chunksize once, then build the cumulative arrays once.
  • Leave the interp dask path on boundary='nearest' so it stays consistent with scipy's mode='nearest' semantics that the eager numpy interp path uses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    QA/QCbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions