Skip to content

flow_path_d8: no memory guard on H*W working arrays #1364

@brendancol

Description

@brendancol

Summary

flow_path_d8 on the eager numpy and cupy backends allocates working buffers proportional to H * W with no memory guard. Tracing the call graph from the public API:

  • numpy path (flow_path_d8):

    • fd = data.astype(np.float64): 8 B/pixel (working copy of flow_dir)
    • sp = np.asarray(sp_data, dtype=np.float64): 8 B/pixel (start_points float64 copy)
    • out = np.empty((H, W), dtype=np.float64) in _flow_path_cpu: 8 B/pixel (output)
    • Measured peak via tracemalloc on a 2000x2000 grid: ~21 bytes/pixel.
  • cupy path (_flow_path_cupy): copies both inputs to the host via .get(), runs the CPU kernel, and copies the output back via cp.asarray. Host-side peak matches the CPU budget; device-side residency is the input plus the float64 output (~16 B/pixel device).

A 60000x60000 grid on the numpy backend would require ~21 GB of working memory before failing, with no guardrails. The dask and dask+cupy backends stream chunks lazily and are unaffected.

Proposed fix

Mirror the per-module guard pattern used in #1318, #1319, #1303, #1334, #1338, #1339, #1344, #1355, etc.: a private _check_memory(H, W) and _check_gpu_memory(H, W) helper raising MemoryError when the projected working set exceeds 50% of available host or GPU memory. Wired into the public flow_path_d8 dispatch before the eager astype call. The dask and dask+cupy paths skip the guard.

Use 24 B/pixel as the CPU budget (rounded up from the measured 21) and 32 B/pixel as the GPU budget, consistent with the surrounding hydro guards.

Tests

  1. numpy raises MemoryError when _available_memory_bytes is patched to a small value
  2. cupy raises MemoryError when _available_gpu_memory_bytes is patched to a small value
  3. dask backend bypasses the guard
  4. normal-size raster passes the guard
  5. error message names the grid dimensions and points the user toward dask

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghigh-priorityoomOut-of-memory risk with large datasets

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions