Summary
flow_path_d8 on the eager numpy and cupy backends allocates working buffers proportional to H * W with no memory guard. Tracing the call graph from the public API:
-
numpy path (flow_path_d8):
fd = data.astype(np.float64): 8 B/pixel (working copy of flow_dir)
sp = np.asarray(sp_data, dtype=np.float64): 8 B/pixel (start_points float64 copy)
out = np.empty((H, W), dtype=np.float64) in _flow_path_cpu: 8 B/pixel (output)
- Measured peak via
tracemalloc on a 2000x2000 grid: ~21 bytes/pixel.
-
cupy path (_flow_path_cupy): copies both inputs to the host via .get(), runs the CPU kernel, and copies the output back via cp.asarray. Host-side peak matches the CPU budget; device-side residency is the input plus the float64 output (~16 B/pixel device).
A 60000x60000 grid on the numpy backend would require ~21 GB of working memory before failing, with no guardrails. The dask and dask+cupy backends stream chunks lazily and are unaffected.
Proposed fix
Mirror the per-module guard pattern used in #1318, #1319, #1303, #1334, #1338, #1339, #1344, #1355, etc.: a private _check_memory(H, W) and _check_gpu_memory(H, W) helper raising MemoryError when the projected working set exceeds 50% of available host or GPU memory. Wired into the public flow_path_d8 dispatch before the eager astype call. The dask and dask+cupy paths skip the guard.
Use 24 B/pixel as the CPU budget (rounded up from the measured 21) and 32 B/pixel as the GPU budget, consistent with the surrounding hydro guards.
Tests
- numpy raises
MemoryError when _available_memory_bytes is patched to a small value
- cupy raises
MemoryError when _available_gpu_memory_bytes is patched to a small value
- dask backend bypasses the guard
- normal-size raster passes the guard
- error message names the grid dimensions and points the user toward dask
Summary
flow_path_d8on the eager numpy and cupy backends allocates working buffers proportional toH * Wwith no memory guard. Tracing the call graph from the public API:numpy path (
flow_path_d8):fd = data.astype(np.float64): 8 B/pixel (working copy offlow_dir)sp = np.asarray(sp_data, dtype=np.float64): 8 B/pixel (start_points float64 copy)out = np.empty((H, W), dtype=np.float64)in_flow_path_cpu: 8 B/pixel (output)tracemallocon a 2000x2000 grid: ~21 bytes/pixel.cupy path (
_flow_path_cupy): copies both inputs to the host via.get(), runs the CPU kernel, and copies the output back viacp.asarray. Host-side peak matches the CPU budget; device-side residency is the input plus the float64 output (~16 B/pixel device).A 60000x60000 grid on the numpy backend would require ~21 GB of working memory before failing, with no guardrails. The dask and dask+cupy backends stream chunks lazily and are unaffected.
Proposed fix
Mirror the per-module guard pattern used in #1318, #1319, #1303, #1334, #1338, #1339, #1344, #1355, etc.: a private
_check_memory(H, W)and_check_gpu_memory(H, W)helper raisingMemoryErrorwhen the projected working set exceeds 50% of available host or GPU memory. Wired into the publicflow_path_d8dispatch before the eagerastypecall. The dask and dask+cupy paths skip the guard.Use 24 B/pixel as the CPU budget (rounded up from the measured 21) and 32 B/pixel as the GPU budget, consistent with the surrounding hydro guards.
Tests
MemoryErrorwhen_available_memory_bytesis patched to a small valueMemoryErrorwhen_available_gpu_memory_bytesis patched to a small value