polygonize: batch dask per-chunk compute instead of serial loop by brendancol · Pull Request #2673 · xarray-contrib/xarray-spatial

brendancol · 2026-05-29T20:33:48Z

Background

The dask backend for polygonize originally called dask.compute once per chunk in a nested loop, so it ran serially. #2608 already moved it to per-row batching (one compute per row of chunks). This PR generalizes that into a fixed batch_size so the memory bound no longer scales with the chunk-grid shape.

What

Build the per-chunk dask.delayed(_polygonize_chunk) tasks in row-major order, then compute them in bounded batches (one dask.compute(*tasks) call per batch).
Default batch size is 32 chunks. Larger batches mean more parallelism and higher peak memory; smaller batches cap memory harder. Per-row batching bounded memory by one row of chunks, which grows with the number of column chunks; a fixed batch size bounds it independent of grid shape. The tradeoff is documented in the _polygonize_dask docstring, a code comment, and docs/source/reference/dask_laziness.rst.

Why it is output-identical

Tasks are built and their results consumed in the same row-major (iy, ix) order the serial loop used. Interior and boundary polygons accumulate in that order, so the downstream boundary-merge stage sees the same input. batch_size=1 reproduces the old per-chunk behaviour exactly.

Backend coverage

Touches the dask path only (dask+numpy and dask+cupy both route through _polygonize_dask). numpy and cupy paths are unchanged.

Test plan

test_polygonize_dask_batch_size_invariant: batched output is vertex-identical to batch_size=1 across batch sizes 1, 2, 3, 100 on a 9-chunk raster.
test_polygonize_dask_batched_matches_numpy_areas: batched multi-chunk areas match numpy for batch sizes 1, 4, 100.
Updated the polygonize dask backend serializes chunks via per-chunk dask.compute() #2608 regression test (test_compute_batches_chunks) to pin the bounded-batch contract: ceil(n_chunks / batch_size) compute calls, neither per-chunk nor per-raster.
Existing xrspatial/tests/test_polygonize*.py pass.

The dask backend called dask.compute once per chunk in a nested loop, so dask only ever saw one block of work at a time and ran serially. Collect the per-chunk delayed tasks in row-major order and compute them in bounded batches (default 32 chunks per dask.compute call), which lets dask schedule each batch in parallel while capping peak memory. Output is unchanged: tasks are built and consumed in the same row-major (iy, ix) order the serial loop used, so polygons map back to the right block for the boundary-merge stage. Add tests asserting batched output is vertex-identical to the old per-chunk (batch_size=1) path across batch sizes, and that batched multi-chunk areas match numpy. Document the behavior in dask_laziness.rst.

…2664) main already moved the dask path from per-chunk to per-row batching (#2608). Per-row batching still lets peak memory grow with the number of column chunks. Generalize it to a fixed batch_size so the memory bound is independent of the chunk-grid shape, and update the #2608 regression test to pin the bounded-batch contract instead of the per-row one.

brendancol

PR Review: polygonize batch dask per-chunk compute

Blockers

None.

Suggestions

batch_size defaults to 32 as a module constant, and there is no way to set it through the public polygonize() API. Fine for this PR since the task scoped it as internal, but worth a line in the docstring saying that tuning means editing the constant. Not blocking.

Nits

xrspatial/polygonize.py: dask.compute(*tasks[start:start+batch_size]) returns a tuple even for a one-element slice, so the for interior, boundary in results loop is safe on the single-chunk path. No change needed, just confirming it.

What looks good

Output identity holds up. Tasks are built and consumed in the same row-major (iy, ix) order as the old serial loop, and batch_size=1 reproduces the old per-chunk behaviour. test_polygonize_dask_batch_size_invariant pins vertex-exact equality across batch sizes, which is the right invariant here since numpy and dask vertex ordering already differ.
The PR supersedes the per-row batching from #2608 and updates that regression test to the bounded-batch contract, rather than leaving a per-row assertion that is now false.
The memory/parallelism tradeoff is documented in the docstring, a code comment, and dask_laziness.rst.
Backend coverage checks out: only the dask path changed, and dask+numpy and dask+cupy both route through _polygonize_dask.

Checklist

Algorithm unchanged (performance-only): yes
Backends consistent: yes, dask path only
NaN handling: unchanged
Edge cases tested: multi-chunk, masked, several batch sizes
Dask chunk boundaries: ordering preserved for the merge stage
No premature materialization: batching bounds peak memory
Benchmark: none added, performance-only change to existing behaviour
Docstrings: updated

brendancol

Follow-up review

Applied the one suggestion from the previous pass: the _polygonize_dask docstring now states that batch_size is internal-only and that tuning it means editing the module-level _DASK_CHUNK_BATCH_SIZE constant.

The remaining nit needed no code change (it confirmed the single-chunk slice path is safe). No blockers. Tests pass: test_polygonize.py and test_polygonize_dask_row_batch_2608.py.

github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026

brendancol added 2 commits May 29, 2026 13:35

brendancol force-pushed the issue-2664 branch from 3c3bc23 to c072ade Compare May 29, 2026 20:37

brendancol commented May 29, 2026

View reviewed changes

Address review: note batch_size is internal-only in docstring (#2664)

c576587

brendancol commented May 29, 2026

View reviewed changes

brendancol merged commit ee2ef03 into main May 29, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polygonize: batch dask per-chunk compute instead of serial loop#2673

polygonize: batch dask per-chunk compute instead of serial loop#2673
brendancol merged 3 commits into
mainfrom
issue-2664

brendancol commented May 29, 2026 •

edited

Loading

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

What

Why it is output-identical

Backend coverage

Test plan

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: polygonize batch dask per-chunk compute

Blockers

Suggestions

Nits

What looks good

Checklist

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Follow-up review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

brendancol commented May 29, 2026 •

edited

Loading