Skip to content

polygonize: batch dask per-chunk compute instead of serial loop#2673

Merged
brendancol merged 3 commits into
mainfrom
issue-2664
May 29, 2026
Merged

polygonize: batch dask per-chunk compute instead of serial loop#2673
brendancol merged 3 commits into
mainfrom
issue-2664

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

@brendancol brendancol commented May 29, 2026

Closes #2664

Background

The dask backend for polygonize originally called dask.compute once per chunk in a nested loop, so it ran serially. #2608 already moved it to per-row batching (one compute per row of chunks). This PR generalizes that into a fixed batch_size so the memory bound no longer scales with the chunk-grid shape.

What

  • Build the per-chunk dask.delayed(_polygonize_chunk) tasks in row-major order, then compute them in bounded batches (one dask.compute(*tasks) call per batch).
  • Default batch size is 32 chunks. Larger batches mean more parallelism and higher peak memory; smaller batches cap memory harder. Per-row batching bounded memory by one row of chunks, which grows with the number of column chunks; a fixed batch size bounds it independent of grid shape. The tradeoff is documented in the _polygonize_dask docstring, a code comment, and docs/source/reference/dask_laziness.rst.

Why it is output-identical

Tasks are built and their results consumed in the same row-major (iy, ix) order the serial loop used. Interior and boundary polygons accumulate in that order, so the downstream boundary-merge stage sees the same input. batch_size=1 reproduces the old per-chunk behaviour exactly.

Backend coverage

Touches the dask path only (dask+numpy and dask+cupy both route through _polygonize_dask). numpy and cupy paths are unchanged.

Test plan

  • test_polygonize_dask_batch_size_invariant: batched output is vertex-identical to batch_size=1 across batch sizes 1, 2, 3, 100 on a 9-chunk raster.
  • test_polygonize_dask_batched_matches_numpy_areas: batched multi-chunk areas match numpy for batch sizes 1, 4, 100.
  • Updated the polygonize dask backend serializes chunks via per-chunk dask.compute() #2608 regression test (test_compute_batches_chunks) to pin the bounded-batch contract: ceil(n_chunks / batch_size) compute calls, neither per-chunk nor per-raster.
  • Existing xrspatial/tests/test_polygonize*.py pass.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026
The dask backend called dask.compute once per chunk in a nested loop, so
dask only ever saw one block of work at a time and ran serially. Collect
the per-chunk delayed tasks in row-major order and compute them in
bounded batches (default 32 chunks per dask.compute call), which lets
dask schedule each batch in parallel while capping peak memory.

Output is unchanged: tasks are built and consumed in the same row-major
(iy, ix) order the serial loop used, so polygons map back to the right
block for the boundary-merge stage.

Add tests asserting batched output is vertex-identical to the old
per-chunk (batch_size=1) path across batch sizes, and that batched
multi-chunk areas match numpy. Document the behavior in dask_laziness.rst.
…2664)

main already moved the dask path from per-chunk to per-row batching
(#2608). Per-row batching still lets peak memory grow with the number of
column chunks. Generalize it to a fixed batch_size so the memory bound is
independent of the chunk-grid shape, and update the #2608 regression test
to pin the bounded-batch contract instead of the per-row one.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: polygonize batch dask per-chunk compute

Blockers

None.

Suggestions

  • batch_size defaults to 32 as a module constant, and there is no way to set it through the public polygonize() API. Fine for this PR since the task scoped it as internal, but worth a line in the docstring saying that tuning means editing the constant. Not blocking.

Nits

  • xrspatial/polygonize.py: dask.compute(*tasks[start:start+batch_size]) returns a tuple even for a one-element slice, so the for interior, boundary in results loop is safe on the single-chunk path. No change needed, just confirming it.

What looks good

  • Output identity holds up. Tasks are built and consumed in the same row-major (iy, ix) order as the old serial loop, and batch_size=1 reproduces the old per-chunk behaviour. test_polygonize_dask_batch_size_invariant pins vertex-exact equality across batch sizes, which is the right invariant here since numpy and dask vertex ordering already differ.
  • The PR supersedes the per-row batching from #2608 and updates that regression test to the bounded-batch contract, rather than leaving a per-row assertion that is now false.
  • The memory/parallelism tradeoff is documented in the docstring, a code comment, and dask_laziness.rst.
  • Backend coverage checks out: only the dask path changed, and dask+numpy and dask+cupy both route through _polygonize_dask.

Checklist

  • Algorithm unchanged (performance-only): yes
  • Backends consistent: yes, dask path only
  • NaN handling: unchanged
  • Edge cases tested: multi-chunk, masked, several batch sizes
  • Dask chunk boundaries: ordering preserved for the merge stage
  • No premature materialization: batching bounds peak memory
  • Benchmark: none added, performance-only change to existing behaviour
  • Docstrings: updated

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review

Applied the one suggestion from the previous pass: the _polygonize_dask docstring now states that batch_size is internal-only and that tuning it means editing the module-level _DASK_CHUNK_BATCH_SIZE constant.

The remaining nit needed no code change (it confirmed the single-chunk slice path is safe). No blockers. Tests pass: test_polygonize.py and test_polygonize_dask_row_batch_2608.py.

@brendancol brendancol merged commit ee2ef03 into main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

polygonize: Dask backend runs serial, one dask.compute per chunk

1 participant