Fix hotspots() silent all-zeros on degenerate dask inputs#2860
Merged
Conversation
The numpy and cupy hotspots paths validate the global Gi* terms through _gistar_global_stats, which raises for fewer than 2 valid cells or a zero global std. The dask paths kept those terms as lazy 0-d arrays and skipped the check, so a constant, all-NaN, or single-valid-cell raster classified to a silent all-zeros result instead of raising. Add a lazy validation block (_gistar_validate_lazy) that re-applies the numpy-path check at compute time and returns the z-score unchanged. The dask graph stays lazy on call (issue #2772), and the error now fires at compute() like the numpy and cupy backends. Cover the three degenerate cases across numpy, cupy, dask+numpy, and dask+cupy.
brendancol
commented
Jun 2, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Fix hotspots() silent all-zeros on degenerate dask inputs
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None.
Nits (optional improvements)
_gistar_validate_lazyruns once per chunk viamap_blocks, so the check repeats for every block of a multi-chunk raster. It is two scalar comparisons and only one block needs to raise, so this is harmless. Putting the guard in the block function (rather than a single up-front check) is what keeps the graph lazy, which is the point of #2772. (focal.py:1374)
What looks good
- The fix keeps the laziness contract from #2772: validation is folded into the dask graph via
map_blocksand fires atcompute(), not at graph-build time.test_hotspots_no_eager_computestill passes. _gistar_validate_lazyreuses_gistar_global_stats, so the dask error messages and exception types match numpy/cupy exactly (ValueError for n<2, ZeroDivisionError for std==0).meta=z_array._metakeeps the chunk element type, so the dask+cupy path stays on-device.._metais already used this way in kde.py and utils.py.- Tests cover all three degenerate cases (constant, all-NaN, single valid cell) across numpy, cupy, dask+numpy, and dask+cupy, with the dask tests asserting the error fires at compute().
Checklist
- Algorithm matches reference (no algorithm change; validation parity only)
- All implemented backends produce consistent results
- NaN handling is correct (all-NaN -> n=0 -> ValueError)
- Edge cases are covered by tests
- Dask chunk boundaries handled correctly (0-d scalars broadcast to every block)
- No premature materialization (global terms stay lazy)
- Benchmark not needed (bugfix, no perf change)
- README feature matrix not applicable (no new function, no backend change)
- Docstrings present and accurate
brendancol
commented
Jun 2, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review (after docs commit)
Re-reviewed after the second commit. The only change since the first review is a docs fix.
Changes since last review
docs/source/reference/dask_laziness.rst: the hotspots row still said "Partially lazy / Computes global mean and std," which went stale when #2772 made the dask path lazy. Updated it to "Fully lazy" with a note that the global terms stay lazy and the degenerate-input check fires at compute. This matches the code andtest_hotspots_no_eager_compute.
Disposition of earlier findings
- No Blockers or Suggestions were raised.
- The single Nit (validation runs per block) is intentional: putting the guard inside the block function is what keeps the graph lazy per #2772. Dismissed, not a defect.
No new issues. Code is unchanged from the first review.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2843
What
hotspots()paths validate the global Gi* terms via_gistar_global_stats, raising for fewer than 2 valid cells (ValueError) or a zero global std (ZeroDivisionError). The dask paths kept those terms as lazy 0-d arrays and never ran the check, so constant, all-NaN, or single-valid-cell rasters classified to a silent all-zeros raster instead of raising._gistar_validate_lazy, amap_blocksstep that re-applies the numpy-path check at compute time and returns the z-score unchanged. The graph stays lazy on call (issue hotspots() computes global stats eagerly for Dask input instead of staying lazy #2772); the error now fires atcompute()to match numpy and cupy.Backends
numpy and cupy unchanged. dask+numpy and dask+cupy now raise the same errors as numpy on degenerate input.
Test plan
test_hotspots_degenerate_*_2843across numpy / cupy / dask+numpy / dask+cupy for constant, all-NaN, and single-valid-cell rasterstest_hotspots_no_eager_compute)