Keep dask+cupy geodesic slope lat/lon lazy to avoid GPU OOM#2776
Merged
Conversation
Port the lazy-block pattern from aspect.py into slope's dask+cupy geodesic path. Building the graph no longer densifies the full (H, W) lat/lon grids onto a single GPU via cupy.asarray; each block is converted lazily with map_blocks(_to_cupy_f64) instead. Add a regression test that asserts graph construction stays under one full lat/lon grid of GPU memory.
brendancol
commented
Jun 1, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Keep dask+cupy geodesic slope lat/lon lazy to avoid GPU OOM
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None.
Nits (optional improvements)
xrspatial/tests/test_geodesic_slope.py:299has a pre-existing F401 (import cupyunused) intest_numpy_equals_cupy, just above the touched class. Not introduced by this PR, but it is a one-line cleanup if you are already in the file.
What looks good
- The fix is a direct port of the aspect.py pattern (
aspect.py:286), down to the_to_cupy_f64helper and the explanatory comment. Both modules now handle the dask+cupy geodesic path the same way. lat_2d/lon_2dcome from_extract_latlon_coordsasnp.broadcast_toviews, soda.from_array(lat_2d)keeps a small zero-stride numpy view in the graph andmap_blocks(_to_cupy_f64)defers the cupy conversion to compute time. No full grid lands on the GPU at graph-build.- The regression test measures real GPU pool usage and asserts it stays under one full lat/lon grid. I confirmed it fails on the old eager code (allocated 2x one grid, 67MB) and passes after the fix.
- Only the dask+cupy path changed. numpy, cupy, and dask+numpy are untouched, and the existing
test_numpy_equals_dask_cupystill guards cross-backend parity.
Checklist
- Algorithm matches reference (port of aspect.py)
- All implemented backends produce consistent results (parity test unchanged)
- NaN handling is correct (unchanged by this PR)
- Edge cases covered (new graph-build memory test; large-raster path)
- Dask chunk boundaries handled correctly (map_overlap depth unchanged)
- No premature materialization (the point of the change)
- Benchmark exists or is not needed (internal bug fix, no new API)
- README feature matrix updated (not applicable, no new function)
- Docstrings present and accurate (no public API change)
brendancol
commented
Jun 1, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review (after nit fix)
The one nit from the first pass is resolved: the unused import cupy in test_numpy_equals_cupy is gone, and flake8 now runs clean on both changed files. The import cupy inside the new graph-build test is used (cupy.zeros, cupy.get_default_memory_pool), so it stays.
No new findings. No blockers, no suggestions, no remaining nits. The fix still matches the aspect.py pattern and the regression test still passes on a CUDA GPU.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_run_dask_cupy_geodesicinslope.pyno longer builds the full lat/lon grids on the GPU during graph construction. It keeps lat/lon as dask-of-numpy on the broadcast views and converts each block to cupy lazily withmap_blocks(_to_cupy_f64), the same wayaspect.pyalready does it.cupy.asarray(lat_2d)andcupy.asarray(lon_2d)densified both full (H, W) grids onto a single GPU before any compute ran, which could OOM on large rasters.Backend coverage
Only the dask+cupy geodesic path changed. The numpy, cupy, and dask+numpy paths are untouched. Cross-backend parity stays covered by the existing
test_numpy_equals_dask_cupy.Test plan
test_latlon_not_materialized_on_gpu_at_graph_buildchecks that graph construction allocates well under one full lat/lon grid of GPU memory.test_geodesic_slope.pysuite passes (22 passed) on a CUDA GPU.Closes #2762