Summary
snap_pour_point_d8 allocates three full-resolution float64 buffers on the numpy and cupy backends with no memory guard. On a large grid this can exhaust host or device memory before the kernel completes.
Where
xrspatial/hydro/snap_pour_point_d8.py -- public dispatch in snap_pour_point_d8(...).
CPU dispatch (isinstance(fa_data, np.ndarray) branch):
fa = fa_data.astype(np.float64) # H*W * 8 B
pp = np.asarray(pp_data, dtype=np.float64) # H*W * 8 B
out = _snap_pour_point_cpu(fa, pp, ...) # H*W * 8 B
CuPy dispatch (_snap_pour_point_cupy) does the same on device:
fa = flow_accum_data.astype(cp.float64)
pp = pour_points_data.astype(cp.float64)
out = cp.full((H, W), cp.nan, dtype=cp.float64)
Bytes per pixel
| backend |
working buffers |
total |
| numpy |
fa(8) + pp(8) + out(8) |
24 B/px |
| cupy |
fa(8) + pp(8) + out(8) |
24 B/px |
The dask and dask+cupy paths process windows around sparse pour points and do not allocate H*W working buffers, so they should skip the guard.
Worked example
A 30000 x 30000 float64 grid:
30000 * 30000 * 24 = 21.6 GB working memory
That fits on a workstation with 32 GB free, fails on most laptops, and can OOM-kill the process on shared hosts.
Proposed fix
Same pattern as #1318 / #1319 (flow_accumulation_d8) and the follow-on guards in #1350 / #1355 / #1357:
- Add
_check_memory(H, W) and _check_gpu_memory(H, W) helpers in snap_pour_point_d8.py, with _BYTES_PER_PIXEL = 24 and _GPU_BYTES_PER_PIXEL = 24.
- Threshold: raise
MemoryError if required > 50% of MemAvailable (psutil fallback) for CPU, or 50% of cupy.cuda.runtime.memGetInfo()[0] for GPU.
- Wire the checks into the public
snap_pour_point_d8 dispatch before the eager numpy / cupy allocations. Dask and dask+cupy paths skip the guard.
- Error message points the user at the dask backend.
Summary
snap_pour_point_d8allocates three full-resolution float64 buffers on the numpy and cupy backends with no memory guard. On a large grid this can exhaust host or device memory before the kernel completes.Where
xrspatial/hydro/snap_pour_point_d8.py-- public dispatch insnap_pour_point_d8(...).CPU dispatch (
isinstance(fa_data, np.ndarray)branch):CuPy dispatch (
_snap_pour_point_cupy) does the same on device:Bytes per pixel
The dask and dask+cupy paths process windows around sparse pour points and do not allocate H*W working buffers, so they should skip the guard.
Worked example
A 30000 x 30000 float64 grid:
That fits on a workstation with 32 GB free, fails on most laptops, and can OOM-kill the process on shared hosts.
Proposed fix
Same pattern as #1318 / #1319 (flow_accumulation_d8) and the follow-on guards in #1350 / #1355 / #1357:
_check_memory(H, W)and_check_gpu_memory(H, W)helpers insnap_pour_point_d8.py, with_BYTES_PER_PIXEL = 24and_GPU_BYTES_PER_PIXEL = 24.MemoryErrorif required > 50% ofMemAvailable(psutil fallback) for CPU, or 50% ofcupy.cuda.runtime.memGetInfo()[0]for GPU.snap_pour_point_d8dispatch before the eager numpy / cupy allocations. Dask and dask+cupy paths skip the guard.