flow_path_mfd: no memory guard on H*W working arrays

## Summary

`flow_path_mfd()` allocates several `(8, H, W)` and `(H, W)` float64 working arrays in the numpy and cupy backends without any pre-allocation memory check. Large rasters can exhaust host or device RAM and abort the process before any error message reaches the caller.

This continues the hydro memory-guard series (#1318, #1319, ..., #1351, #1355, #1357).

## Affected backends

- numpy (`flow_path_mfd` -> `_flow_path_mfd_cpu` via the public dispatch)
- cupy (`_flow_path_mfd_cupy` does `.get()` + `.astype(float64)` host-side)

The dask and dask+cupy paths process tiles and are bounded by chunk size, so they skip the guard.

## Working-memory accounting

CPU peak working set (B/px):

- `data.astype(np.float64)` copy of `(8, H, W)` fractions: 64
- `np.asarray(sp_data, dtype=np.float64)` copy of start_points: 8
- `out` (H, W) float64 written by `_flow_path_mfd_cpu`: 8

Total: ~80 B/px.

GPU peak working set (B/px), `_flow_path_mfd_cupy`:

- Host-side `.get()` of `(8, H, W)` fractions: 64
- Host-side `.astype(np.float64)` copy: up to 64 (skipped when dtype already matches)
- Host-side output (H, W) float64: 8
- Device-side output (H, W) float64: 8

Conservative GPU budget: ~88 B/px.

## Worked example

A 30000 x 30000 fractions raster:

- numpy: 30000 * 30000 * 80 = 72 GB peak working memory.
- cupy: 30000 * 30000 * 88 = ~79 GB host residency during the host-side trace.

Both far exceed typical workstation RAM, and no `MemoryError` is raised before allocation.

## Proposed fix

Match the pattern from #1351 (flow_length_mfd) and #1337 (stream_link_mfd):

- Per-module `_BYTES_PER_PIXEL` and `_GPU_BYTES_PER_PIXEL` constants with a comment breaking down the count.
- `_available_memory_bytes()` reads `/proc/meminfo` then falls back to `psutil`, then a 2 GiB default.
- `_available_gpu_memory_bytes()` queries `cupy.cuda.runtime.memGetInfo`, returns 0 if unavailable.
- `_check_memory(H, W)` raises `MemoryError` if projected use exceeds 50% of available host RAM.
- `_check_gpu_memory(H, W)` mirrors the host check on the GPU; skips silently when CUDA is unavailable.
- Wire both checks into the public `flow_path_mfd` dispatch before the eager allocations. Dask and dask+cupy paths skip the guard.

## Tests

- numpy raster raises `MemoryError` when projected RAM exceeds budget (mock `_available_memory_bytes`).
- numpy raster of normal size succeeds.
- dask path bypasses the guard even when memory is mocked low.
- error message mentions grid dimensions and points to dask.
- cupy raster raises `MemoryError` when projected GPU RAM exceeds budget (skipped without CUDA).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flow_path_mfd: no memory guard on H*W working arrays #1365

Summary

Affected backends

Working-memory accounting

Worked example

Proposed fix

Tests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

flow_path_mfd: no memory guard on H*W working arrays #1365

Description

Summary

Affected backends

Working-memory accounting

Worked example

Proposed fix

Tests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions