Skip to content

Fix reproject.merge same-CRS dask path materializing full source per chunk#1576

Merged
brendancol merged 2 commits intomainfrom
deep-sweep-performance-reproject-2026-05-10
May 11, 2026
Merged

Fix reproject.merge same-CRS dask path materializing full source per chunk#1576
brendancol merged 2 commits intomainfrom
deep-sweep-performance-reproject-2026-05-10

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

_merge_block_adapter called src_data.compute() on the full source
dask array for every output chunk on the same-CRS direct-placement
path. Measured: 256x256 source with 32x32 output chunks materialized
8.9M pixels for 131K source pixels (68x amplification). For an
8192x8192 source with 256x256 output chunks, the amplification would
push driver-side data flow into terabyte territory.

The fix adds _place_same_crs_lazy (xrspatial/reproject/init.py)
which mirrors _place_same_crs but slices the source window first and
calls .compute() only on that slice. Post-fix amplification is 1.00x.
Mirrors the slicing pattern already in _reproject_chunk_numpy and
_reproject_chunk_cupy.

Surfaced by the 2026-05-10 reproject performance sweep.

Closes #1571.

Test plan

  • New regression test test_merge_dask_same_crs_bounded_materialization traces da.Array.compute() calls and asserts total materialized pixels stay under 3x total source size.
  • Existing test_merge_dask_same_crs_matches_eager continues to pass (bit-equal placement preserved).
  • Full reproject test suite passes (194 tests).
  • Direct measurement: pre-fix 68.00x, post-fix 1.00x ratio of materialized pixels over total source size.

…chunk

`_merge_block_adapter` called `src_data.compute()` on the full source
dask array for every output chunk on the same-CRS direct-placement path.
For a 256x256 source split into 32x32 output chunks, this materialized
8.9M pixels for 131K source pixels (68x amplification). At realistic
8192x8192 source with 256x256 output chunks the amplification would push
driver-side data flow into terabyte territory.

The fix adds `_place_same_crs_lazy` that mirrors `_place_same_crs` but
slices the source window first and only calls `.compute()` on that
slice. Post-fix ratio is 1.00x (131K pixels materialized for 131K
source). Mirrors the slicing pattern already in
`_reproject_chunk_numpy` and `_reproject_chunk_cupy`.

Adds regression test `test_merge_dask_same_crs_bounded_materialization`
that traces `da.Array.compute()` calls and asserts total materialized
pixels stay under 3x total source size.

Surfaced by the 2026-05-10 reproject performance sweep.
Closes #1571.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 11:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a performance bug in xrspatial.reproject.merge() where the same-CRS dask path was materializing the full source array once per output chunk, causing extreme amplification of driver-side data flow.

Changes:

  • Add _place_same_crs_lazy() to slice the source window before materializing data for same-CRS placement.
  • Update _merge_block_adapter to use the lazy same-CRS placement path instead of computing the full source per chunk.
  • Add a regression test that traces da.Array.compute() and asserts bounded materialization.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
xrspatial/reproject/__init__.py Introduces a same-CRS placement helper that computes only the needed source window, and wires it into the dask merge adapter.
xrspatial/tests/test_reproject.py Adds a regression test to prevent reintroducing full-source materialization per output chunk on the same-CRS dask path.
.claude/sweep-performance-state.csv Updates internal performance sweep tracking notes to record issue #1571 and the fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/reproject/__init__.py
Comment thread xrspatial/reproject/__init__.py Outdated
Comment thread xrspatial/tests/test_reproject.py Outdated
Address PR #1576 review:
- Delete _place_same_crs_lazy: it was a near-line-for-line copy of
  _place_same_crs whose only divergence was an explicit window.compute().
  _place_same_crs already slices before np.asarray(), so passing a dask
  source straight through materializes only the slice's chunks (via
  np.asarray -> __array__ -> compute) -- exactly what the duplicate
  was trying to achieve. Removing it eliminates a drift risk and the
  misleading "numpy sources fall through unchanged" docstring claim.
- _merge_block_adapter now calls _place_same_crs directly, with a
  comment explaining why the previous full-source .compute() is gone.
- Test now uses pytest's monkeypatch fixture instead of raw
  da.Array.compute reassignment + try/finally, so a future early
  return or exception cannot leak the patch.
@brendancol brendancol merged commit 5ea4730 into main May 11, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject.merge: same-CRS dask path materializes full source per output chunk

2 participants