Fix balanced_allocation dask+cupy TypeError on source_ids by brendancol · Pull Request #1569 · xarray-contrib/xarray-spatial

brendancol · 2026-05-11T05:20:54Z

Summary

Fixes balanced_allocation: TypeError on dask+cupy backend from cupy source_ids #1568
_extract_sources returned a cupy array on dask+cupy rasters because da.unique().compute() yields cupy and np.sort did not convert. That propagated into source_ids[best_idx], making alloc cupy, and fric_weight[alloc == sid] tripped cupy's implicit-numpy-conversion guard.
Convert the unique result to numpy via .get() before mask/sort so source_ids is always numpy.

Test plan

pytest xrspatial/tests/test_balanced_allocation.py — 17/17 pass across numpy, cupy, dask+numpy, dask+cupy
Failing parametrization test_two_sources_uniform_friction[dask+cupy] now passes

Audited the one new commit since pass 10: - #1559 (769bcf7): centralise GeoTIFF attrs population across all read backends via a single _populate_attrs_from_geo_info helper. Pure metadata-handling refactor. The helper dedupes attrs population across the eager numpy, dask, and two GPU read paths so all four backends emit the same key set. No new allocations, no file I/O changes, no kernel changes, no dtype handling changes -- the helper only writes to an in-memory attrs dict from already-validated geo_info fields. Re-verified all prior guards intact: - MAX_PIXELS_DEFAULT=1e9 at _reader.py:46 - MAX_IFD_ENTRY_COUNT=100_000 at _header.py:34 - MAX_IFDS=256 at _header.py:45 - MAX_TILE_BYTES_DEFAULT=256 MiB at _reader.py:68 - _MAX_DASK_CHUNKS=50_000 at __init__.py:1428 - realpath canonicalisation at _reader.py:161 and _vrt.py:160 Cat 1-6 all clean. No PRs opened.

Re-audit after #1559 (centralise attrs across all read backends). No new HIGH/MEDIUM findings. SAFE/IO-bound holds. - New _populate_attrs_from_geo_info helper runs once per read, not per chunk -- no perf impact on hot paths. - Probe: read_geotiff_dask(2560x2560, chunks=256) yields 400 tasks (4 tasks/chunk for 100 chunks), well under 1M cap. - Probe: read_geotiff_gpu(1024x1024) returns cupy.ndarray end-to-end with no host round-trip (226ms incl. write+decode).

Audited the one geotiff commit added since pass 10 -- #1559 (PR 1548), which centralises attrs population across the four read backends. The change is a pure metadata refactor; no data-path arithmetic was touched. Windowed-origin math and cross-backend attrs/data parity verified.

da.unique().compute() on a dask+cupy array returns a cupy array, which propagates through np.sort and then into source_ids[best_idx], making alloc a cupy array. The downstream fric_weight[alloc == sid] then attempts an implicit numpy conversion and raises TypeError. Convert the unique result to numpy via .get() before the mask/sort so source_ids is always numpy.

Copilot

Pull request overview

Fixes a backend interop bug in balanced_allocation when running on dask+cupy by ensuring extracted source_ids are always NumPy, preventing downstream CuPy implicit NumPy conversion errors.

Changes:

Convert da.unique(...).compute() results to NumPy in _extract_sources when the computed array is a CuPy array.
Update .claude sweep state CSVs with new GeoTIFF audit pass entries (security/performance/accuracy tracking metadata).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`xrspatial/balanced_allocation.py`	Forces dask+cupy `da.unique().compute()` output to NumPy so `source_ids` stays NumPy and avoids CuPy implicit conversion errors.
`.claude/sweep-security-state.csv`	Records latest GeoTIFF security audit pass metadata.
`.claude/sweep-performance-state.csv`	Records latest GeoTIFF performance audit pass metadata.
`.claude/sweep-accuracy-state.csv`	Records latest GeoTIFF accuracy audit pass metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        if hasattr(uniq, 'get'):  # cupy
+            uniq = uniq.get()


Per review feedback, replace the inline hasattr(uniq, 'get') check with the existing _as_numpy() helper that already centralizes cupy->numpy conversion. Keeps backend-detection logic in one place.

brendancol · 2026-05-11T13:49:33Z

Addressed Copilot's review note: now using the existing _as_numpy() helper in _extract_sources instead of inlining the .get() check, keeping backend-detection logic centralized.

Also folded in latest main to drop a few unrelated files that came along when I branched from a locally-ahead main. PR diff is now just xrspatial/balanced_allocation.py.

CI: 10/11 jobs green. The remaining failure on run (windows-latest, 3.13) is test_perf_under_100ms_on_1024sq_uint8 in the geotiff module (mode resampling took 327ms vs a 100ms threshold on a slow Windows runner) — unrelated to this change.

brendancol added 4 commits May 10, 2026 22:04

github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026

brendancol requested a review from Copilot May 11, 2026 05:22

Copilot started reviewing on behalf of brendancol May 11, 2026 05:22 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread xrspatial/balanced_allocation.py Outdated

Comment on lines +67 to +68

if hasattr(uniq, 'get'): # cupy

uniq = uniq.get()

brendancol added 2 commits May 11, 2026 06:48

Merge origin/main into issue-1568

cf22090

Use _as_numpy helper in _extract_sources (#1568 review)

1282f02

Per review feedback, replace the inline hasattr(uniq, 'get') check with the existing _as_numpy() helper that already centralizes cupy->numpy conversion. Keeps backend-detection logic in one place.

brendancol merged commit 19bf017 into main May 11, 2026
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix balanced_allocation dask+cupy TypeError on source_ids#1569

Fix balanced_allocation dask+cupy TypeError on source_ids#1569
brendancol merged 6 commits into
mainfrom
issue-1568

brendancol commented May 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

brendancol commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brendancol commented May 11, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

brendancol commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants