Fix balanced_allocation dask+cupy TypeError on source_ids#1569
Conversation
Audited the one new commit since pass 10: - #1559 (769bcf7): centralise GeoTIFF attrs population across all read backends via a single _populate_attrs_from_geo_info helper. Pure metadata-handling refactor. The helper dedupes attrs population across the eager numpy, dask, and two GPU read paths so all four backends emit the same key set. No new allocations, no file I/O changes, no kernel changes, no dtype handling changes -- the helper only writes to an in-memory attrs dict from already-validated geo_info fields. Re-verified all prior guards intact: - MAX_PIXELS_DEFAULT=1e9 at _reader.py:46 - MAX_IFD_ENTRY_COUNT=100_000 at _header.py:34 - MAX_IFDS=256 at _header.py:45 - MAX_TILE_BYTES_DEFAULT=256 MiB at _reader.py:68 - _MAX_DASK_CHUNKS=50_000 at __init__.py:1428 - realpath canonicalisation at _reader.py:161 and _vrt.py:160 Cat 1-6 all clean. No PRs opened.
Re-audit after #1559 (centralise attrs across all read backends). No new HIGH/MEDIUM findings. SAFE/IO-bound holds. - New _populate_attrs_from_geo_info helper runs once per read, not per chunk -- no perf impact on hot paths. - Probe: read_geotiff_dask(2560x2560, chunks=256) yields 400 tasks (4 tasks/chunk for 100 chunks), well under 1M cap. - Probe: read_geotiff_gpu(1024x1024) returns cupy.ndarray end-to-end with no host round-trip (226ms incl. write+decode).
Audited the one geotiff commit added since pass 10 -- #1559 (PR 1548), which centralises attrs population across the four read backends. The change is a pure metadata refactor; no data-path arithmetic was touched. Windowed-origin math and cross-backend attrs/data parity verified.
da.unique().compute() on a dask+cupy array returns a cupy array, which propagates through np.sort and then into source_ids[best_idx], making alloc a cupy array. The downstream fric_weight[alloc == sid] then attempts an implicit numpy conversion and raises TypeError. Convert the unique result to numpy via .get() before the mask/sort so source_ids is always numpy.
There was a problem hiding this comment.
Pull request overview
Fixes a backend interop bug in balanced_allocation when running on dask+cupy by ensuring extracted source_ids are always NumPy, preventing downstream CuPy implicit NumPy conversion errors.
Changes:
- Convert
da.unique(...).compute()results to NumPy in_extract_sourceswhen the computed array is a CuPy array. - Update
.claudesweep state CSVs with new GeoTIFF audit pass entries (security/performance/accuracy tracking metadata).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
xrspatial/balanced_allocation.py |
Forces dask+cupy da.unique().compute() output to NumPy so source_ids stays NumPy and avoids CuPy implicit conversion errors. |
.claude/sweep-security-state.csv |
Records latest GeoTIFF security audit pass metadata. |
.claude/sweep-performance-state.csv |
Records latest GeoTIFF performance audit pass metadata. |
.claude/sweep-accuracy-state.csv |
Records latest GeoTIFF accuracy audit pass metadata. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if hasattr(uniq, 'get'): # cupy | ||
| uniq = uniq.get() |
Per review feedback, replace the inline hasattr(uniq, 'get') check with the existing _as_numpy() helper that already centralizes cupy->numpy conversion. Keeps backend-detection logic in one place.
|
Addressed Copilot's review note: now using the existing Also folded in latest CI: 10/11 jobs green. The remaining failure on |
Summary
_extract_sourcesreturned a cupy array on dask+cupy rasters becauseda.unique().compute()yields cupy andnp.sortdid not convert. That propagated intosource_ids[best_idx], makingalloccupy, andfric_weight[alloc == sid]tripped cupy's implicit-numpy-conversion guard..get()before mask/sort sosource_idsis always numpy.Test plan
pytest xrspatial/tests/test_balanced_allocation.py— 17/17 pass across numpy, cupy, dask+numpy, dask+cupytest_two_sources_uniform_friction[dask+cupy]now passes