geotiff: ngjit 2x2 kernels for float mean/min/max/median overviews (#2413) by brendancol · Pull Request #2475 · xarray-contrib/xarray-spatial

brendancol · 2026-05-27T03:52:54Z

Closes #2413.

Summary

Adds xrspatial/geotiff/_overview_kernels.py with four type-specialized @ngjit 2x2 kernels (mean, min, max, median) that walk output pixels directly, fuse the sentinel comparison into the read, and bounds-check odd shapes so no NaN pad copy is needed.
Wires the kernels into the float branch of _block_reduce_2d via a _NGJIT_REDUCE_METHODS set. Integer dtypes plus mode / cubic / nearest keep their existing numpy paths.
All-NaN / all-sentinel blocks still return NaN so the caller's downstream sentinel rewrite is unchanged.

The integer-dtype native kernel is intentionally left as a follow-up. The existing numpy path computes the sentinel mask at native integer width before promoting to float64, which is required for 64-bit sentinels (INT64_MAX, UINT64_MAX) that round when cast to float64. A test pins that contract.

Backend coverage

CPU only. GPU (_block_reduce_2d_gpu) and dask paths aren't changed.

Measured

Best-of-5 on this machine, jit warmup excluded:

case	new	old
1024^2 f32 mean	0.58 ms	~11 ms
1024^2 f32 mean + sentinel	0.63 ms	~18 ms
4096^2 f32 mean + sentinel	10.86 ms	~331 ms
4096^2 f32 median + sentinel	24.60 ms	~338 ms

Matches the numbers in #2413.

Test plan

129 new parity tests in tests/test_overview_kernels_2413.py (kernel vs numpy reference across float32/float64, even/odd shapes, with/without sentinel, NaN edge cases, inf sentinel, NaN-as-nodata no-op, median-of-4 vs np.median)
All 328 existing cases in tests/write/test_overview.py still pass
912 tests across tests/write/ plus the new file pass together
Integer 64-bit sentinel contract pinned by test_integer_dtype_still_routes_through_numpy_path

…2413) Replace the generic numpy reshape + nan-aggregation pipeline in the float mean / min / max / median branch of `_block_reduce_2d` with type-specialized ngjit kernels that walk output pixels directly, fuse the sentinel comparison into the read, and bounds-check odd shapes instead of pre-padding. All-NaN / all-sentinel blocks still return NaN so the caller's downstream sentinel rewrite is unchanged. Integer dtypes and the `mode` / `cubic` / `nearest` methods keep their existing numpy paths because the integer sentinel mask has to be computed at native dtype width (64-bit sentinels round when cast to float64). Measured speedups on this machine match the issue's numbers: 1024^2 f32 mean 0.58 ms vs ~11 ms numpy; 4096^2 f32 mean+sentinel 10.86 ms vs ~331 ms numpy.

brendancol

PR Review: ngjit 2x2 kernels for float mean/min/max/median overviews

The change is tight: drop-in float fast path, integer/mode/cubic/nearest untouched, parity pinned by a numpy reference. Existing 328 cases in tests/write/test_overview.py still pass and 129 new parametrized cases match the reference. No correctness blockers.

Blockers

None.

Suggestions

Accumulator dtype drift for float32 inputs. In _overview_kernels.py:42 (total = 0.0) and :81 / :120 (best = 0.0), the locals start as Python floats so numba unifies them to float64. For float32 input, the mean is computed in float64 and downcast on store; min/max widen each compare to float64. Behavior is fine (and arguably more precise for mean than the prior np.nanmean of float32), but it does mean the "bit-identical" claim in #2413 will not hold across the board. Worth pinning the choice with an arr.dtype.type(0) initializer (or a typed signature) if you want the float32 fast path to actually run in float32. Otherwise add a one-liner in the module docstring noting that float32 inputs are accumulated in float64.
No benchmark in benchmarks/. The geotiff module has no benchmarks at all today, so this is not a regression. #2413 cited specific numbers, so a small benchmarks/benchmarks/geotiff_overview.py (1024^2 and 4096^2 f32, with/without sentinel, mean/median) would let asv catch a future numba- or compiler-driven regression. Could land as a follow-up.

Nits

_overview.py:347 — the dispatch does a dict lookup (_KERNELS[method]) per call. Fine in practice; an if/elif chain would shave ~1 us but adds branch noise. Leave as is unless profiling says otherwise.
_overview_kernels.py:189-201 — the median sort network is correct but dense. A short example trace in the comment (e.g. "(4,2,3,1) -> (2,4,3,1) -> (2,3,4,1) -> ... -> (1,2,3,4)") would help future readers verify the swap order at a glance.
_overview_kernels.py — the four kernels share a near-identical 4-cell fetch + NaN/sentinel filter. Factoring would hurt ngjit inlining, so the duplication is justified, but worth a one-line note in the module docstring saying the duplication is intentional so the next reader does not try to DRY it.

What looks good

The integer-routing contract is pinned by test_integer_dtype_still_routes_through_numpy_path using INT64_MAX. That is exactly the case a future refactor would silently break.
inf-as-sentinel and NaN-as-nodata edge cases are both covered, mirroring the existing _block_reduce_2d gates.
Module placement matches the _compression.py precedent the issue cited.
The bounds-check pattern (r0+1 < h, c0+1 < w) avoids the NaN-pad copy for odd shapes — that was one of the perf wins called out in the issue and the kernel structure makes it explicit.

Checklist

Algorithm matches reference (numpy nan-aware aggregation, pinned by parity tests)
All implemented backends consistent — CPU only by design, GPU/dask untouched
NaN handling correct (all-NaN -> NaN; partial -> in-bounds-only)
Edge cases covered (1x1, single row/col, all-sentinel, all-NaN, partial residual)
Dask chunk boundaries — N/A (CPU-only kernel, dask path unchanged)
No premature materialization or unnecessary copies (output buffer only)
Benchmark — not added; module has none today
README feature matrix — N/A (no new public function, no backend coverage change)
Docstrings present and accurate

Address review nits on #2475: - Module docstring now spells out that the mean accumulator is float64 by way of numba's Python-float unification (matches the prior np.nanmean accuracy on float32) and that min/max follow the same pattern with no value drift since both are exact reductions. - Note that per-method kernel duplication is intentional so the JIT keeps the inner loop body visible to inlining; do not refactor into a shared helper. - Annotate the median compare-swap ladder with a concrete n=4 trace so the swap order is verifiable at a glance.

brendancol

Follow-up review

Dispositions on the prior review:

Suggestions

Accumulator dtype drift: addressed via docstring (commit 7abf705). The mean accumulator is intentionally float64 (matches np.nanmean accuracy on float32 inputs) and the docstring now states that explicitly. Min/max use the same pattern but the result is identical since both reductions are exact — also called out.
No benchmark in benchmarks/: deferred. The geotiff module has no benchmarks at all today, and adding one for just overview reduction would set a one-off precedent. A broader benchmarks/benchmarks/geotiff_*.py cluster is a better follow-up issue and out of scope here.

Nits

_KERNELS[method] dict lookup: dismissed — micro-overhead, current shape is fine.
Median sort network needs a trace: addressed (commit 7abf705). Comment now includes the n=4 (4,2,3,1) → (1,2,3,4) trace.
Kernel duplication note: addressed (commit 7abf705). Module docstring now records that the per-method duplication is deliberate so ngjit can inline the loop body.

Re-check

All 129 new parity cases and the 328 existing tests/write/test_overview.py cases pass after the docstring edits.

github-actions Bot added the performance PR touches performance-sensitive code label May 27, 2026

brendancol commented May 27, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into issue-2413

5051f61

brendancol merged commit 619cdd8 into main May 27, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

geotiff: ngjit 2x2 kernels for float mean/min/max/median overviews (#2413)#2475

geotiff: ngjit 2x2 kernels for float mean/min/max/median overviews (#2413)#2475
brendancol merged 3 commits into
mainfrom
issue-2413

brendancol commented May 27, 2026

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented May 27, 2026

Summary

Backend coverage

Measured

Test plan

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: ngjit 2x2 kernels for float mean/min/max/median overviews

Blockers

Suggestions

Nits

What looks good

Checklist

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Follow-up review

Suggestions

Nits

Re-check

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant