Skip to content

Compute the Getis-Ord Gi* statistic in emerging_hotspots()#3040

Merged
brendancol merged 2 commits into
mainfrom
issue-2804
Jun 8, 2026
Merged

Compute the Getis-Ord Gi* statistic in emerging_hotspots()#3040
brendancol merged 2 commits into
mainfrom
issue-2804

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #2804

emerging_hotspots() carried its own copy of the local-mean z-score bug that hotspots() had before #2803. It convolved each time step with a normalized kernel and z-scored the local mean against the global mean and std, which drops the neighborhood weight sum, the squared weight sum, the sample count n, and the variance adjustment term.

What changed:

  • Replace the per-step local-mean z-score with the real Gi* z-score, reusing the _gistar_zscore and _gistar_global_stats helpers from Implement the Getis-Ord Gi* statistic in hotspots() #2803. The global mean, population std, and valid-cell count n are taken over the whole space-time cube; the per-step weight sums come from convolving the validity mask with the kernel and squared kernel, so raster edges and interior NaN cells drop out of every neighborhood sum.
  • Rework the dask backends to run the per-step convolutions through the dispatching convolve_2d (matching the hotspots() dask path) instead of a manual map_overlap, so the dask edge handling matches numpy and cupy.
  • Keep empty-neighborhood cells as NaN so the Mann-Kendall step still detects pixels with no data across the whole series. The trend classification is unchanged.

Backends: numpy, cupy, dask+numpy, dask+cupy all use the same Gi* math through the existing ArrayTypeFunctionMapping dispatch.

Test plan:

  • Per-step z-score matches a Gi* reference for a binary kernel.
  • Same for a weighted (non-binary) kernel, where W_i != W2_i.
  • Output no longer matches the old local-mean formula for a weighted kernel.
  • Interior NaN excluded from the neighborhood; neighbors stay finite.
  • numpy/dask parity (interior matches, edge ring NaN on both).
  • pytest xrspatial/tests/test_emerging_hotspots.py xrspatial/tests/test_focal.py pass (327 tests).

emerging_hotspots() carried its own copy of the local-mean z-score bug
that hotspots() had before #2803: it convolved each time step with a
normalized kernel and z-scored the local mean against the global mean and
std, dropping the neighborhood weight sum, the squared weight sum, the
sample count n, and the variance adjustment term.

Replace the per-step local-mean z-score with the real Gi* z-score,
reusing the _gistar_zscore and _gistar_global_stats helpers from #2803.
The global mean, population std, and valid-cell count n are taken over the
whole space-time cube; the per-step weight sums come from convolving the
validity mask with the kernel and squared kernel, so raster edges and
interior NaN cells drop out of every neighborhood sum.

The dask backends now run the per-step convolutions through the
dispatching convolve_2d (matching the hotspots() dask path) instead of a
manual map_overlap, so the dask edge handling lines up with numpy and
cupy. Empty-neighborhood cells stay NaN so the Mann-Kendall step can still
detect all-NaN pixels.

The Mann-Kendall trend classification is unchanged.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 8, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Compute the Getis-Ord Gi* statistic in emerging_hotspots()

This swaps the per-step local-mean z-score for the real Gi* statistic, following the #2803 fix to hotspots(). I read the full emerging_hotspots.py and the focal.py Gi* helpers it reuses, ran the existing suite plus the new tests, and checked numpy/dask parity across all four boundary modes.

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • benchmarks/benchmarks/focal.py has a time_hotspots benchmark but none for emerging_hotspots. Pre-existing gap, not introduced here, and not needed for a bugfix. Worth a follow-up only if the function's runtime starts to matter.
  • _gistar_step_zscore (emerging_hotspots.py:130) uses cupy.get_array_module guarded by cupy is not None, while _gistar_step_dask uses da.* directly. Same convolution terms, two idioms. Fine as is, but a one-line comment on why the single-array path needs the xp switch and the dask path does not would save the next reader a double-take.

What looks good

  • The core fix is correct. Global mean, population std, and valid-cell count n come from the whole space-time cube, and the per-step weight sums come from convolving the validity mask with the kernel and squared kernel. The per-step output matches a hand-rolled Gi* reference to 1e-5 for both a binary and a weighted kernel.
  • Reworking the dask backends to call convolve_2d instead of a manual map_overlap was the right move. It drops the edge-handling discrepancy the old map_overlap path had: numpy and dask now agree to 2.4e-7 on the interior with identical NaN positions for nan, nearest, reflect, and wrap.
  • Keeping empty-neighborhood cells as NaN in _gistar_step_finalize is well reasoned and tested. A plain 0 would have broken the all-NaN-pixel detection in the Mann-Kendall step.
  • Orphaned imports from the removed map_overlap path (_convolve_2d_numpy, _convolve_2d_cupy, _boundary_to_dask) were cleaned up, and the pre-existing unused not_implemented_func import was left alone, which is the right call for a surgical bugfix.
  • New tests cover the binary kernel, the weighted kernel where W != W2, a regression guard that the output no longer matches the old local-mean formula, and interior-NaN exclusion.

Checklist

  • Algorithm matches reference: yes, the Gi* formula from #2803
  • All implemented backends consistent: yes, numpy/dask verified, cupy paths reuse the same helpers
  • NaN handling correct: yes
  • Edge cases covered by tests: yes
  • Dask chunk boundaries handled correctly: yes, via convolve_2d dispatch
  • No premature materialization: yes, global stats computed once with a single da.compute
  • Benchmark exists or not needed: not needed for a bugfix
  • README feature matrix updated: not applicable, no new function
  • Docstrings present and accurate: yes, unchanged and still correct

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 86fc634)

Re-reviewed after the follow-up commit. The only change is a 4-line comment in _gistar_step_zscore explaining why the single-array path picks numpy vs cupy via cupy.get_array_module while the dask path uses da.* directly. That clears the one nit worth fixing in-PR.

The other nit (no emerging_hotspots benchmark in benchmarks/benchmarks/focal.py) is deferred: it is a pre-existing gap, the benchmark file is not touched by this PR, and adding one would push past the scope of a bugfix.

No blockers, no suggestions, nothing else outstanding. All 49 tests in test_emerging_hotspots.py pass locally.

@brendancol brendancol merged commit 42703b4 into main Jun 8, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

emerging_hotspots() repeats the local-mean z-score bug instead of computing Gi*

1 participant