Skip to content

Reject cyclic MFD fraction grids instead of returning wrong results or hanging#2884

Merged
brendancol merged 2 commits into
mainfrom
issue-2866
Jun 3, 2026
Merged

Reject cyclic MFD fraction grids instead of returning wrong results or hanging#2884
brendancol merged 2 commits into
mainfrom
issue-2866

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2866

The MFD flow functions assume their fraction grid is a DAG but never check it. A hand-built or user-supplied grid with a cycle slipped through: flow_accumulation_mfd and flow_length_mfd returned wrong numbers (cells in the cycle kept their initial value), and flow_path_mfd looped forever.

Changes

  • flow_accumulation_mfd and flow_length_mfd: after the Kahn traversal, compare the processed-cell count against the valid-cell count. If a cycle left some cells with nonzero in-degree, raise a ValueError naming the problem. Applied to the eager CPU kernels and the dask tile kernels (downstream and upstream).
  • flow_path_mfd: the dominant-neighbor trace now caps at H*W steps. A path through a DAG visits each cell at most once, so exceeding that bound means a cycle. Applied to the CPU while True loop and the dask tracing loop. Both raise instead of spinning.

Backend coverage

The cycle guard runs on numpy, dask+numpy, and dask+cupy (the cupy and dask+cupy paths route through the same CPU/tile kernels). cupy follows the CPU kernel as well.

Test plan

  • New ValueError tests for a 2-cell cycle on numpy and dask, for all three functions and both flow_length directions
  • Existing 82 MFD tests still pass (90 total with the new ones)
  • Confirmed a valid flow_direction_mfd grid still computes on every backend

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Reject cyclic MFD fraction grids instead of returning wrong results or hanging

Blockers

None.

Suggestions

None.

Nits

  • The same ValueError message is repeated across six sites (the CPU and dask-tile kernels in flow_accumulation_mfd.py and flow_length_mfd.py, plus two in flow_path_mfd.py). Numba njit functions can't pull a raise message from a module-level constant, so the duplication is hard to avoid for the kernels. Not worth refactoring.

What looks good

  • The cycle check is the standard Kahn-completion test: after the topological traversal, compare the processed-cell count against the valid-cell count. Correct, with O(1) extra work.
  • head (processed) and tail (ordered) are equivalent here because the BFS drains its queue. Both stay below n_valid only when a cycle remains.
  • flow_path_mfd caps the trace at HW steps. A dominant-neighbor path through a DAG visits each cell at most once, so the bound is tight and won't false-positive. A full-length 2-cell valid path (HW=2) does not trip the cap.
  • No false positives on valid grids: pits and flats are in_degree-0 sources, and off-grid flow creates no in-grid edges. Confirmed on a random elevation-derived MFD grid across numpy and dask.
  • The guard sits in both the eager and dask-tile kernels, so numpy, dask+numpy, and dask+cupy all raise. cupy routes through the CPU kernel.
  • Tests cover numpy and dask for all three functions and both flow_length directions. 8 new tests; the existing 82 still pass.

Checklist

  • Algorithm matches reference: yes (Kahn completion check)
  • All implemented backends consistent: yes
  • NaN handling correct: unchanged; n_valid counts non-NaN cells
  • Edge cases covered: cycle on numpy and dask; 1x1 and full-length paths checked manually
  • Dask chunk boundaries: guard fires per tile; cross-tile cycles are out of scope for this PR
  • No premature materialization: just an integer counter
  • Benchmark: existing flow benchmarks apply; no new function added
  • README matrix: not applicable, no new function or backend change
  • Docstrings: unchanged, still accurate

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 2, 2026
@brendancol brendancol merged commit 715390e into main Jun 3, 2026
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MFD flow functions accept cyclic fraction grids: wrong results or infinite hang

1 participant