polygonize: batch dask per-chunk compute instead of serial loop#2673
Merged
Conversation
The dask backend called dask.compute once per chunk in a nested loop, so dask only ever saw one block of work at a time and ran serially. Collect the per-chunk delayed tasks in row-major order and compute them in bounded batches (default 32 chunks per dask.compute call), which lets dask schedule each batch in parallel while capping peak memory. Output is unchanged: tasks are built and consumed in the same row-major (iy, ix) order the serial loop used, so polygons map back to the right block for the boundary-merge stage. Add tests asserting batched output is vertex-identical to the old per-chunk (batch_size=1) path across batch sizes, and that batched multi-chunk areas match numpy. Document the behavior in dask_laziness.rst.
…2664) main already moved the dask path from per-chunk to per-row batching (#2608). Per-row batching still lets peak memory grow with the number of column chunks. Generalize it to a fixed batch_size so the memory bound is independent of the chunk-grid shape, and update the #2608 regression test to pin the bounded-batch contract instead of the per-row one.
brendancol
commented
May 29, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: polygonize batch dask per-chunk compute
Blockers
None.
Suggestions
batch_sizedefaults to 32 as a module constant, and there is no way to set it through the publicpolygonize()API. Fine for this PR since the task scoped it as internal, but worth a line in the docstring saying that tuning means editing the constant. Not blocking.
Nits
xrspatial/polygonize.py:dask.compute(*tasks[start:start+batch_size])returns a tuple even for a one-element slice, so thefor interior, boundary in resultsloop is safe on the single-chunk path. No change needed, just confirming it.
What looks good
- Output identity holds up. Tasks are built and consumed in the same row-major
(iy, ix)order as the old serial loop, andbatch_size=1reproduces the old per-chunk behaviour.test_polygonize_dask_batch_size_invariantpins vertex-exact equality across batch sizes, which is the right invariant here since numpy and dask vertex ordering already differ. - The PR supersedes the per-row batching from #2608 and updates that regression test to the bounded-batch contract, rather than leaving a per-row assertion that is now false.
- The memory/parallelism tradeoff is documented in the docstring, a code comment, and
dask_laziness.rst. - Backend coverage checks out: only the dask path changed, and dask+numpy and dask+cupy both route through
_polygonize_dask.
Checklist
- Algorithm unchanged (performance-only): yes
- Backends consistent: yes, dask path only
- NaN handling: unchanged
- Edge cases tested: multi-chunk, masked, several batch sizes
- Dask chunk boundaries: ordering preserved for the merge stage
- No premature materialization: batching bounds peak memory
- Benchmark: none added, performance-only change to existing behaviour
- Docstrings: updated
brendancol
commented
May 29, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review
Applied the one suggestion from the previous pass: the _polygonize_dask docstring now states that batch_size is internal-only and that tuning it means editing the module-level _DASK_CHUNK_BATCH_SIZE constant.
The remaining nit needed no code change (it confirmed the single-chunk slice path is safe). No blockers. Tests pass: test_polygonize.py and test_polygonize_dask_row_batch_2608.py.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2664
Background
The dask backend for
polygonizeoriginally calleddask.computeonce per chunk in a nested loop, so it ran serially. #2608 already moved it to per-row batching (one compute per row of chunks). This PR generalizes that into a fixedbatch_sizeso the memory bound no longer scales with the chunk-grid shape.What
dask.delayed(_polygonize_chunk)tasks in row-major order, then compute them in bounded batches (onedask.compute(*tasks)call per batch)._polygonize_daskdocstring, a code comment, anddocs/source/reference/dask_laziness.rst.Why it is output-identical
Tasks are built and their results consumed in the same row-major
(iy, ix)order the serial loop used. Interior and boundary polygons accumulate in that order, so the downstream boundary-merge stage sees the same input.batch_size=1reproduces the old per-chunk behaviour exactly.Backend coverage
Touches the dask path only (dask+numpy and dask+cupy both route through
_polygonize_dask). numpy and cupy paths are unchanged.Test plan
test_polygonize_dask_batch_size_invariant: batched output is vertex-identical tobatch_size=1across batch sizes 1, 2, 3, 100 on a 9-chunk raster.test_polygonize_dask_batched_matches_numpy_areas: batched multi-chunk areas match numpy for batch sizes 1, 4, 100.test_compute_batches_chunks) to pin the bounded-batch contract: ceil(n_chunks / batch_size) compute calls, neither per-chunk nor per-raster.xrspatial/tests/test_polygonize*.pypass.