Skip to content

Keep dask+cupy geodesic slope lat/lon lazy to avoid GPU OOM#2776

Merged
brendancol merged 2 commits into
mainfrom
issue-2762
Jun 1, 2026
Merged

Keep dask+cupy geodesic slope lat/lon lazy to avoid GPU OOM#2776
brendancol merged 2 commits into
mainfrom
issue-2762

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • _run_dask_cupy_geodesic in slope.py no longer builds the full lat/lon grids on the GPU during graph construction. It keeps lat/lon as dask-of-numpy on the broadcast views and converts each block to cupy lazily with map_blocks(_to_cupy_f64), the same way aspect.py already does it.
  • Before this, cupy.asarray(lat_2d) and cupy.asarray(lon_2d) densified both full (H, W) grids onto a single GPU before any compute ran, which could OOM on large rasters.

Backend coverage

Only the dask+cupy geodesic path changed. The numpy, cupy, and dask+numpy paths are untouched. Cross-backend parity stays covered by the existing test_numpy_equals_dask_cupy.

Test plan

  • New regression test test_latlon_not_materialized_on_gpu_at_graph_build checks that graph construction allocates well under one full lat/lon grid of GPU memory.
  • Confirmed the test fails on the old eager code (allocated 2x one grid) and passes after the fix.
  • Full test_geodesic_slope.py suite passes (22 passed) on a CUDA GPU.

Closes #2762

Port the lazy-block pattern from aspect.py into slope's dask+cupy
geodesic path. Building the graph no longer densifies the full
(H, W) lat/lon grids onto a single GPU via cupy.asarray; each block
is converted lazily with map_blocks(_to_cupy_f64) instead. Add a
regression test that asserts graph construction stays under one full
lat/lon grid of GPU memory.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 1, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Keep dask+cupy geodesic slope lat/lon lazy to avoid GPU OOM

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • xrspatial/tests/test_geodesic_slope.py:299 has a pre-existing F401 (import cupy unused) in test_numpy_equals_cupy, just above the touched class. Not introduced by this PR, but it is a one-line cleanup if you are already in the file.

What looks good

  • The fix is a direct port of the aspect.py pattern (aspect.py:286), down to the _to_cupy_f64 helper and the explanatory comment. Both modules now handle the dask+cupy geodesic path the same way.
  • lat_2d/lon_2d come from _extract_latlon_coords as np.broadcast_to views, so da.from_array(lat_2d) keeps a small zero-stride numpy view in the graph and map_blocks(_to_cupy_f64) defers the cupy conversion to compute time. No full grid lands on the GPU at graph-build.
  • The regression test measures real GPU pool usage and asserts it stays under one full lat/lon grid. I confirmed it fails on the old eager code (allocated 2x one grid, 67MB) and passes after the fix.
  • Only the dask+cupy path changed. numpy, cupy, and dask+numpy are untouched, and the existing test_numpy_equals_dask_cupy still guards cross-backend parity.

Checklist

  • Algorithm matches reference (port of aspect.py)
  • All implemented backends produce consistent results (parity test unchanged)
  • NaN handling is correct (unchanged by this PR)
  • Edge cases covered (new graph-build memory test; large-raster path)
  • Dask chunk boundaries handled correctly (map_overlap depth unchanged)
  • No premature materialization (the point of the change)
  • Benchmark exists or is not needed (internal bug fix, no new API)
  • README feature matrix updated (not applicable, no new function)
  • Docstrings present and accurate (no public API change)

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after nit fix)

The one nit from the first pass is resolved: the unused import cupy in test_numpy_equals_cupy is gone, and flake8 now runs clean on both changed files. The import cupy inside the new graph-build test is used (cupy.zeros, cupy.get_default_memory_pool), so it stays.

No new findings. No blockers, no suggestions, no remaining nits. The fix still matches the aspect.py pattern and the regression test still passes on a CUDA GPU.

@brendancol brendancol merged commit ecaad3e into main Jun 1, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dask+cupy geodesic slope eagerly materializes full lat/lon grids on GPU

1 participant