Vectorize mode-resampling for COG overview generation#1526
Merged
brendancol merged 1 commit intoxarray-contrib:mainfrom May 9, 2026
Merged
Vectorize mode-resampling for COG overview generation#1526brendancol merged 1 commit intoxarray-contrib:mainfrom
brendancol merged 1 commit intoxarray-contrib:mainfrom
Conversation
Replace the per-pixel double loop in `_block_reduce_2d(method='mode')`
with a vectorized sort-and-count over the (oh, ow, 4) block tensor.
On a 1024x1024 uint8 input the reference implementation took ~1037 ms;
the vectorized path runs in ~27 ms (about 39x faster).
Output is bit-exact identical to the prior implementation. Tie-break
semantics ("lowest value wins" on equal counts) are preserved because
sorting brings equal values adjacent and `np.argmax` returns the
leftmost (smallest) position when counts tie.
Adds tests/test_mode_overview_perf.py with bit-exact comparison
against a copy of the old reference for randomized inputs across
uint8/uint16/int16/int32/uint32/int64 and odd dimensions, hand-crafted
tie-break cases, and a 100 ms sanity guard on a 1024^2 input.
There was a problem hiding this comment.
Pull request overview
This PR optimizes GeoTIFF/COG overview generation by replacing the previous per-block np.unique loop used for method='mode' downsampling with a vectorized NumPy approach, aiming to drastically reduce runtime while preserving the prior tie-break behavior.
Changes:
- Replaced
_block_reduce_2d(..., method='mode')implementation with a vectorized sort-and-count approach over 2x2 blocks. - Added correctness tests that compare output bit-for-bit against a reference implementation and cover key tie-break cases.
- Added a performance-oriented test intended to guard against regressions in the mode-resampling path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_writer.py |
Implements the new vectorized mode resampling logic for 2x2 block reduction. |
xrspatial/geotiff/tests/test_mode_overview_perf.py |
Adds reference-based correctness tests, tie-break tests, and a runtime budget check for mode resampling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+106
to
+117
| def test_perf_under_100ms_on_1024sq_uint8(): | ||
| rng = np.random.default_rng(seed=0) | ||
| arr = rng.integers(0, 16, size=(1024, 1024), dtype=np.uint8) | ||
| # Warmup | ||
| _block_reduce_2d(arr, 'mode') | ||
| t0 = time.perf_counter() | ||
| out = _block_reduce_2d(arr, 'mode') | ||
| elapsed = time.perf_counter() - t0 | ||
| assert out.shape == (512, 512) | ||
| assert elapsed < 0.1, ( | ||
| f"mode resampling took {elapsed*1000:.1f} ms (threshold 100 ms)" | ||
| ) |
| rng = np.random.default_rng(seed=42) | ||
| info = np.iinfo(dtype) | ||
| # Use a small categorical-style range so ties happen often. | ||
| lo = max(info.min, 0) |
| h2 = (shape[0] // 2) * 2 | ||
| w2 = (shape[1] // 2) * 2 | ||
| if h2 == 0 or w2 == 0: | ||
| return |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
np.uniquedouble loop in_block_reduce_2d(method='mode')with a vectorized sort-and-count over the(oh, ow, 4)block tensor.np.argmaxreturns the leftmost max-count index.Implementation
After reshaping each 2x2 block to a row of 4 cells:
oh * ow).np.argmaxpicks the leftmost position with the highest count, which after sorting is the smallest tied value.Test plan
pytest xrspatial/geotiff/tests/test_mode_overview_perf.py -x -q(48 passed)pytest xrspatial/geotiff/tests/test_cog.py xrspatial/geotiff/tests/test_sparse_cog.py -x -q(30 passed)