Skip to content

Inline dask aggregate kernel in resample (#1463)#1468

Merged
brendancol merged 1 commit intomainfrom
issue-1463
May 4, 2026
Merged

Inline dask aggregate kernel in resample (#1463)#1468
brendancol merged 1 commit intomainfrom
issue-1463

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1463.

Summary

  • Replaces the per-output-pixel func(sub, 1, 1)[0, 0] loop in _agg_block_np with a single @ngjit call per chunk.
  • Adds five new block kernels (_agg_block_mean_nb, _min_nb, _max_nb, _median_nb, _mode_nb) that take the global input/output geometry plus the chunk's in_y0 / in_x0 / go_y0 / go_x0 offsets, derive per-pixel windows from int(go * global_in_h / global_out_h) - in_y0, and write into a pre-allocated (target_h, target_w) buffer.
  • The eager _run_numpy aggregate path is untouched. _agg_block_cupy round-trips through _agg_block_np, so it picks up the speedup unchanged.

Test plan

  • New parity test: 60x60 raster, 20x20 chunks, scale 1/3, all 5 aggregate methods. Output chunks straddle input chunk boundaries via _add_overlap, so the chunk-offset math is exercised. Compared element-wise (assert_array_equal) against _run_numpy.
  • Smoke test: 200x200 raster, 50x50 chunks, scale 0.25, average method. Just confirms the path completes within a generous wall-clock budget.
  • Existing 62 resample tests still pass.

Replace the per-output-pixel `func(sub, 1, 1)[0, 0]` loop in
`_agg_block_np` with a single `@ngjit` call per chunk. Each method gets
a `_agg_block_*_nb` kernel that takes the global geometry and the
chunk's `in_y0 / in_x0 / go_y0 / go_x0` offsets and walks the entire
output region in one numba dispatch, writing into a pre-allocated
output buffer.

The eager `_run_numpy` aggregate path is unchanged. `_agg_block_cupy`
already round-trips to CPU through `_agg_block_np`, so it picks up
the speedup without code changes.

Adds parity tests against `_run_numpy` for all 5 aggregate methods on
a 60x60 raster with 20x20 chunks (output chunks straddle input chunk
boundaries via `_add_overlap`), plus a 200x200 smoke test guarding
against accidental quadratic regressions in the chunk loop.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 4, 2026
@brendancol brendancol merged commit 22172b3 into main May 4, 2026
11 checks passed
@brendancol brendancol deleted the issue-1463 branch May 5, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inline dask aggregate kernel to remove per-pixel numba dispatch

1 participant