geotiff: ngjit 2x2 kernels for float mean/min/max/median overviews (#2413)#2475
Merged
Conversation
…2413) Replace the generic numpy reshape + nan-aggregation pipeline in the float mean / min / max / median branch of `_block_reduce_2d` with type-specialized ngjit kernels that walk output pixels directly, fuse the sentinel comparison into the read, and bounds-check odd shapes instead of pre-padding. All-NaN / all-sentinel blocks still return NaN so the caller's downstream sentinel rewrite is unchanged. Integer dtypes and the `mode` / `cubic` / `nearest` methods keep their existing numpy paths because the integer sentinel mask has to be computed at native dtype width (64-bit sentinels round when cast to float64). Measured speedups on this machine match the issue's numbers: 1024^2 f32 mean 0.58 ms vs ~11 ms numpy; 4096^2 f32 mean+sentinel 10.86 ms vs ~331 ms numpy.
brendancol
commented
May 27, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: ngjit 2x2 kernels for float mean/min/max/median overviews
The change is tight: drop-in float fast path, integer/mode/cubic/nearest untouched, parity pinned by a numpy reference. Existing 328 cases in tests/write/test_overview.py still pass and 129 new parametrized cases match the reference. No correctness blockers.
Blockers
None.
Suggestions
- Accumulator dtype drift for float32 inputs. In
_overview_kernels.py:42(total = 0.0) and:81 / :120(best = 0.0), the locals start as Python floats so numba unifies them to float64. For float32 input, the mean is computed in float64 and downcast on store; min/max widen each compare to float64. Behavior is fine (and arguably more precise for mean than the priornp.nanmeanof float32), but it does mean the "bit-identical" claim in #2413 will not hold across the board. Worth pinning the choice with anarr.dtype.type(0)initializer (or a typed signature) if you want the float32 fast path to actually run in float32. Otherwise add a one-liner in the module docstring noting that float32 inputs are accumulated in float64. - No benchmark in
benchmarks/. The geotiff module has no benchmarks at all today, so this is not a regression. #2413 cited specific numbers, so a smallbenchmarks/benchmarks/geotiff_overview.py(1024^2 and 4096^2 f32, with/without sentinel, mean/median) would let asv catch a future numba- or compiler-driven regression. Could land as a follow-up.
Nits
_overview.py:347— the dispatch does a dict lookup (_KERNELS[method]) per call. Fine in practice; anif/elifchain would shave ~1 us but adds branch noise. Leave as is unless profiling says otherwise._overview_kernels.py:189-201— the median sort network is correct but dense. A short example trace in the comment (e.g. "(4,2,3,1) -> (2,4,3,1) -> (2,3,4,1) -> ... -> (1,2,3,4)") would help future readers verify the swap order at a glance._overview_kernels.py— the four kernels share a near-identical 4-cell fetch + NaN/sentinel filter. Factoring would hurt ngjit inlining, so the duplication is justified, but worth a one-line note in the module docstring saying the duplication is intentional so the next reader does not try to DRY it.
What looks good
- The integer-routing contract is pinned by
test_integer_dtype_still_routes_through_numpy_pathusingINT64_MAX. That is exactly the case a future refactor would silently break. inf-as-sentinel andNaN-as-nodata edge cases are both covered, mirroring the existing_block_reduce_2dgates.- Module placement matches the
_compression.pyprecedent the issue cited. - The bounds-check pattern (
r0+1 < h,c0+1 < w) avoids the NaN-pad copy for odd shapes — that was one of the perf wins called out in the issue and the kernel structure makes it explicit.
Checklist
- Algorithm matches reference (numpy nan-aware aggregation, pinned by parity tests)
- All implemented backends consistent — CPU only by design, GPU/dask untouched
- NaN handling correct (all-NaN -> NaN; partial -> in-bounds-only)
- Edge cases covered (1x1, single row/col, all-sentinel, all-NaN, partial residual)
- Dask chunk boundaries — N/A (CPU-only kernel, dask path unchanged)
- No premature materialization or unnecessary copies (output buffer only)
- Benchmark — not added; module has none today
- README feature matrix — N/A (no new public function, no backend coverage change)
- Docstrings present and accurate
Address review nits on #2475: - Module docstring now spells out that the mean accumulator is float64 by way of numba's Python-float unification (matches the prior np.nanmean accuracy on float32) and that min/max follow the same pattern with no value drift since both are exact reductions. - Note that per-method kernel duplication is intentional so the JIT keeps the inner loop body visible to inlining; do not refactor into a shared helper. - Annotate the median compare-swap ladder with a concrete n=4 trace so the swap order is verifiable at a glance.
brendancol
commented
May 27, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review
Dispositions on the prior review:
Suggestions
- Accumulator dtype drift: addressed via docstring (commit 7abf705). The mean accumulator is intentionally float64 (matches
np.nanmeanaccuracy on float32 inputs) and the docstring now states that explicitly. Min/max use the same pattern but the result is identical since both reductions are exact — also called out. - No benchmark in
benchmarks/: deferred. The geotiff module has no benchmarks at all today, and adding one for just overview reduction would set a one-off precedent. A broaderbenchmarks/benchmarks/geotiff_*.pycluster is a better follow-up issue and out of scope here.
Nits
_KERNELS[method]dict lookup: dismissed — micro-overhead, current shape is fine.- Median sort network needs a trace: addressed (commit 7abf705). Comment now includes the n=4 (4,2,3,1) → (1,2,3,4) trace.
- Kernel duplication note: addressed (commit 7abf705). Module docstring now records that the per-method duplication is deliberate so ngjit can inline the loop body.
Re-check
- All 129 new parity cases and the 328 existing
tests/write/test_overview.pycases pass after the docstring edits.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2413.
Summary
xrspatial/geotiff/_overview_kernels.pywith four type-specialized@ngjit2x2 kernels (mean, min, max, median) that walk output pixels directly, fuse the sentinel comparison into the read, and bounds-check odd shapes so no NaN pad copy is needed._block_reduce_2dvia a_NGJIT_REDUCE_METHODSset. Integer dtypes plusmode/cubic/nearestkeep their existing numpy paths.The integer-dtype native kernel is intentionally left as a follow-up. The existing numpy path computes the sentinel mask at native integer width before promoting to float64, which is required for 64-bit sentinels (
INT64_MAX,UINT64_MAX) that round when cast to float64. A test pins that contract.Backend coverage
CPU only. GPU (
_block_reduce_2d_gpu) and dask paths aren't changed.Measured
Best-of-5 on this machine, jit warmup excluded:
Matches the numbers in #2413.
Test plan
tests/test_overview_kernels_2413.py(kernel vs numpy reference across float32/float64, even/odd shapes, with/without sentinel, NaN edge cases, inf sentinel, NaN-as-nodata no-op, median-of-4 vsnp.median)tests/write/test_overview.pystill passtests/write/plus the new file pass togethertest_integer_dtype_still_routes_through_numpy_path