Compute the Getis-Ord Gi* statistic in emerging_hotspots() by brendancol · Pull Request #3040 · xarray-contrib/xarray-spatial

brendancol · 2026-06-08T10:58:47Z

emerging_hotspots() carried its own copy of the local-mean z-score bug that hotspots() had before #2803. It convolved each time step with a normalized kernel and z-scored the local mean against the global mean and std, which drops the neighborhood weight sum, the squared weight sum, the sample count n, and the variance adjustment term.

What changed:

Replace the per-step local-mean z-score with the real Gi* z-score, reusing the _gistar_zscore and _gistar_global_stats helpers from Implement the Getis-Ord Gi* statistic in hotspots() #2803. The global mean, population std, and valid-cell count n are taken over the whole space-time cube; the per-step weight sums come from convolving the validity mask with the kernel and squared kernel, so raster edges and interior NaN cells drop out of every neighborhood sum.
Rework the dask backends to run the per-step convolutions through the dispatching convolve_2d (matching the hotspots() dask path) instead of a manual map_overlap, so the dask edge handling matches numpy and cupy.
Keep empty-neighborhood cells as NaN so the Mann-Kendall step still detects pixels with no data across the whole series. The trend classification is unchanged.

Backends: numpy, cupy, dask+numpy, dask+cupy all use the same Gi* math through the existing ArrayTypeFunctionMapping dispatch.

Test plan:

Per-step z-score matches a Gi* reference for a binary kernel.
Same for a weighted (non-binary) kernel, where W_i != W2_i.
Output no longer matches the old local-mean formula for a weighted kernel.
Interior NaN excluded from the neighborhood; neighbors stay finite.
numpy/dask parity (interior matches, edge ring NaN on both).
pytest xrspatial/tests/test_emerging_hotspots.py xrspatial/tests/test_focal.py pass (327 tests).

emerging_hotspots() carried its own copy of the local-mean z-score bug that hotspots() had before #2803: it convolved each time step with a normalized kernel and z-scored the local mean against the global mean and std, dropping the neighborhood weight sum, the squared weight sum, the sample count n, and the variance adjustment term. Replace the per-step local-mean z-score with the real Gi* z-score, reusing the _gistar_zscore and _gistar_global_stats helpers from #2803. The global mean, population std, and valid-cell count n are taken over the whole space-time cube; the per-step weight sums come from convolving the validity mask with the kernel and squared kernel, so raster edges and interior NaN cells drop out of every neighborhood sum. The dask backends now run the per-step convolutions through the dispatching convolve_2d (matching the hotspots() dask path) instead of a manual map_overlap, so the dask edge handling lines up with numpy and cupy. Empty-neighborhood cells stay NaN so the Mann-Kendall step can still detect all-NaN pixels. The Mann-Kendall trend classification is unchanged.

brendancol

PR Review: Compute the Getis-Ord Gi* statistic in emerging_hotspots()

This swaps the per-step local-mean z-score for the real Gi* statistic, following the #2803 fix to hotspots(). I read the full emerging_hotspots.py and the focal.py Gi* helpers it reuses, ran the existing suite plus the new tests, and checked numpy/dask parity across all four boundary modes.

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

benchmarks/benchmarks/focal.py has a time_hotspots benchmark but none for emerging_hotspots. Pre-existing gap, not introduced here, and not needed for a bugfix. Worth a follow-up only if the function's runtime starts to matter.
_gistar_step_zscore (emerging_hotspots.py:130) uses cupy.get_array_module guarded by cupy is not None, while _gistar_step_dask uses da.* directly. Same convolution terms, two idioms. Fine as is, but a one-line comment on why the single-array path needs the xp switch and the dask path does not would save the next reader a double-take.

What looks good

The core fix is correct. Global mean, population std, and valid-cell count n come from the whole space-time cube, and the per-step weight sums come from convolving the validity mask with the kernel and squared kernel. The per-step output matches a hand-rolled Gi* reference to 1e-5 for both a binary and a weighted kernel.
Reworking the dask backends to call convolve_2d instead of a manual map_overlap was the right move. It drops the edge-handling discrepancy the old map_overlap path had: numpy and dask now agree to 2.4e-7 on the interior with identical NaN positions for nan, nearest, reflect, and wrap.
Keeping empty-neighborhood cells as NaN in _gistar_step_finalize is well reasoned and tested. A plain 0 would have broken the all-NaN-pixel detection in the Mann-Kendall step.
Orphaned imports from the removed map_overlap path (_convolve_2d_numpy, _convolve_2d_cupy, _boundary_to_dask) were cleaned up, and the pre-existing unused not_implemented_func import was left alone, which is the right call for a surgical bugfix.
New tests cover the binary kernel, the weighted kernel where W != W2, a regression guard that the output no longer matches the old local-mean formula, and interior-NaN exclusion.

Checklist

Algorithm matches reference: yes, the Gi* formula from #2803
All implemented backends consistent: yes, numpy/dask verified, cupy paths reuse the same helpers
NaN handling correct: yes
Edge cases covered by tests: yes
Dask chunk boundaries handled correctly: yes, via convolve_2d dispatch
No premature materialization: yes, global stats computed once with a single da.compute
Benchmark exists or not needed: not needed for a bugfix
README feature matrix updated: not applicable, no new function
Docstrings present and accurate: yes, unchanged and still correct

brendancol

Follow-up review (after `86fc634`)

Re-reviewed after the follow-up commit. The only change is a 4-line comment in _gistar_step_zscore explaining why the single-array path picks numpy vs cupy via cupy.get_array_module while the dask path uses da.* directly. That clears the one nit worth fixing in-PR.

The other nit (no emerging_hotspots benchmark in benchmarks/benchmarks/focal.py) is deferred: it is a pre-existing gap, the benchmark file is not touched by this PR, and adding one would push past the scope of a bugfix.

No blockers, no suggestions, nothing else outstanding. All 49 tests in test_emerging_hotspots.py pass locally.

github-actions Bot added the performance PR touches performance-sensitive code label Jun 8, 2026

brendancol commented Jun 8, 2026

View reviewed changes

Address review nit: explain the xp switch vs the dask path (#2804)

86fc634

brendancol commented Jun 8, 2026

View reviewed changes

brendancol merged commit 42703b4 into main Jun 8, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute the Getis-Ord Gi* statistic in emerging_hotspots()#3040

Compute the Getis-Ord Gi* statistic in emerging_hotspots()#3040
brendancol merged 2 commits into
mainfrom
issue-2804

brendancol commented Jun 8, 2026

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Jun 8, 2026

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: Compute the Getis-Ord Gi* statistic in emerging_hotspots()

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Follow-up review (after 86fc634)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Follow-up review (after `86fc634`)