Skip to content

Fix zonal dask memory guards and stats filtering#1112

Merged
brendancol merged 5 commits intomasterfrom
issue-1110
Mar 31, 2026
Merged

Fix zonal dask memory guards and stats filtering#1112
brendancol merged 5 commits intomasterfrom
issue-1110

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

@brendancol brendancol commented Mar 31, 2026

Summary

  • Improve _regions_dask memory guard: uses shape * itemsize instead of .nbytes to avoid triggering graph inspection on large arrays, with an actionable error message explaining the limitation
  • Add missing memory guard to _regions_dask_cupy using GPU free memory
  • Replace iterrows() zone filtering in _stats_dask_numpy with .isin() to avoid per-row materialization

Context

Found during performance sweep triage (#1110). da.unique().compute() was initially flagged as HIGH severity but confirmed to be safe (it's a per-chunk reduction that only materializes the small unique-values array). The real OOM risk is _regions_dask which intentionally .compute()s the full array since connected-component labeling is a global operation. The guard now catches this early.

Test plan

  • test_regions_dask_memory_guard -- verifies MemoryError raised before .compute() when available memory is insufficient
  • test_stats_dask_zone_filter -- verifies zone_ids filtering works with .isin()

Parallel subagent triage + ralph-loop workflow for auditing all
xrspatial modules for performance bottlenecks, OOM risk under
30TB dask workloads, and backend-specific anti-patterns.
7 tasks covering command scaffold, module scoring, parallel subagent
dispatch, report merging, ralph-loop generation, and smoke tests.
- Improve _regions_dask memory guard: use shape*itemsize instead of
  .nbytes to avoid triggering graph inspection on large arrays, and
  provide actionable error message explaining the limitation.
- Add missing memory guard to _regions_dask_cupy using GPU free memory.
- Replace iterrows() zone filtering in _stats_dask_numpy with boolean
  .isin() indexing to avoid per-row materialization.
- test_regions_dask_memory_guard: verifies MemoryError is raised before
  .compute() when available memory is insufficient
- test_stats_dask_zone_filter: verifies zone_ids filtering works
  correctly with the new .isin() approach
@github-actions github-actions bot added the performance PR touches performance-sensitive code label Mar 31, 2026
@brendancol brendancol merged commit 05798a8 into master Mar 31, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant