Skip to content

Fix COG overview poisoned by nodata sentinel (#1613)#1618

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-11-d
May 11, 2026
Merged

Fix COG overview poisoned by nodata sentinel (#1613)#1618
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-11-d

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Fixes COG overview generation poisoned by nodata sentinel #1613. to_geotiff(..., cog=True, nodata=<finite>) produced corrupted overview pyramids because the NaN-to-sentinel rewrite ran before _make_overview / make_overview_gpu. np.nanmean (and the GPU equivalents) then saw the sentinel as a real value and biased every reduced pixel toward it.
  • Threads nodata through _block_reduce_2d, _block_reduce_2d_gpu, _make_overview, and make_overview_gpu so each reducer masks the sentinel back to NaN before aggregating. The writer's overview loop rewrites any all-sentinel reductions (NaN from the reducer) back to the sentinel for the on-disk pyramid.
  • Both CPU and GPU writer paths fixed. Same fix shape so the two backends stay in lockstep.

Reproduction (before fix)

arr = np.array([
    [1.0, 2.0, 3.0, 4.0],
    [np.nan, np.nan, np.nan, np.nan],
    [10.0, 20.0, 30.0, 40.0],
    [10.0, 20.0, 30.0, 40.0],
], dtype=np.float32)
to_geotiff(xr.DataArray(arr, dims=['y','x']), 'out.tif',
           nodata=-9999.0, cog=True, tile_size=2,
           overview_levels=[1], overview_resampling='mean')
open_geotiff('out.tif', overview_level=1).data
# [[-4998.75 -4997.75]   # poisoned by sentinel
#  [   15.      35.  ]]

After fix: [[1.5, 3.5], [15.0, 35.0]] (matches np.nanmean on the original NaN-keyed data).

Test plan

  • New regression tests in test_cog_overview_nodata_1613.py (11 tests, 8 CPU + 3 GPU): mean / min / max / median ignore the sentinel, partial-NaN blocks reduce correctly, all-NaN blocks reduce to NaN then get rewritten to the sentinel, integer dtype passthrough unchanged, CPU and GPU produce identical pyramids.
  • Existing TestOverviewResampling suite still passes (12 tests).
  • All 235 nodata/overview/cog tests in the geotiff suite pass.

The NaN-to-sentinel rewrite in to_geotiff and write_geotiff_gpu ran
before _make_overview / make_overview_gpu, so np.nanmean and the GPU
counterparts saw the sentinel as a finite value and biased every
overview pixel. A raster with NaN pixels and nodata=-9999 produced
overview cells like -4998.75 where the correct nan-aware mean was 1.5.

Thread a nodata kwarg through the reducers so they mask the sentinel
back to NaN before aggregating. The writer's overview loop passes
nodata in, then rewrites any all-sentinel cells (which surface as NaN
from the reducer) back to the sentinel for the on-disk pyramid.

CPU and GPU paths both fixed. New regression tests cover mean / min /
max / median, partial-NaN blocks, all-NaN blocks, integer dtype
passthrough, and CPU-GPU agreement.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 20:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes COG overview generation when nodata=<finite> is used on float rasters containing NaNs by ensuring overview reducers ignore the nodata sentinel (CPU + GPU paths), and adds a focused regression test suite for issue #1613.

Changes:

  • Thread nodata through CPU overview helpers (_block_reduce_2d, _make_overview) and mask sentinel back to NaN during reduction.
  • Mirror the same nodata-aware masking behavior in GPU overview helpers (_block_reduce_2d_gpu, make_overview_gpu) and in the GPU COG overview loop.
  • Add regression tests covering CPU/GPU behavior and direct helper-level reduction semantics for issue #1613.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
xrspatial/geotiff/_writer.py Adds nodata-aware masking in CPU overview reducers and rewrites NaNs back to sentinel in the overview loop.
xrspatial/geotiff/_gpu_decode.py Adds nodata-aware masking in GPU overview reducers and threads nodata through make_overview_gpu.
xrspatial/geotiff/__init__.py Passes nodata into GPU overview generation and rewrites reducer-produced NaNs back to the sentinel per level.
xrspatial/geotiff/tests/test_cog_overview_nodata_1613.py New regression tests validating correct overview values with nodata sentinels on CPU + GPU.
.claude/sweep-accuracy-state.csv Updates internal accuracy tracking notes to record the #1613 fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_writer.py Outdated
Comment on lines +193 to +196
# and poisons the overview (issue #1613).
if (nodata is not None
and not np.isnan(nodata)
and np.isfinite(nodata)):
Comment thread xrspatial/geotiff/_writer.py Outdated
Comment on lines +1155 to +1158
# all-sentinel comes back as NaN; ``_write_tiled`` / ``_write_stripped``
# serialise that NaN to disk, where the eager reader will mask
# it (and a future writer pass could rewrite to ``nodata`` for
# external readers -- out of scope for this fix).
Comment thread xrspatial/geotiff/_writer.py Outdated
Comment on lines +215 to +217
# nanmean / nanmin / nanmax / nanmedian raise warnings on all-nan
# blocks; ``np.errstate`` would silence them but the resulting NaN is
# the desired output so we leave the warning visible.
Comment thread xrspatial/geotiff/_gpu_decode.py Outdated
Comment on lines +2926 to +2929
# honour it as missing-data (issue #1613).
if (nodata is not None
and not np.isnan(nodata)
and np.isfinite(nodata)):
- Drop redundant np.isfinite gate in _block_reduce_2d (CPU + GPU) so
  nodata=+/-inf is masked back to NaN like a finite sentinel, matching
  the upstream NaN->sentinel rewrite gate (`not np.isnan(nodata)` used
  at _writer.py:1171,1525,1620).
- Suppress RuntimeWarning from nanmean/nanmin/nanmax/nanmedian on
  all-NaN blocks locally; the all-NaN output is the desired signal
  that the overview loop rewrites to the sentinel, so the warning was
  noise on every nodata-border COG write.
- Fix the comment above the overview loop: NaN from an all-sentinel
  reduction is rewritten back to the sentinel before _write_tiled /
  _write_stripped runs, not serialised as NaN.
- Add regression tests covering nodata=inf (CPU + GPU) and the
  warning-suppression contract for all-NaN blocks.
@brendancol brendancol merged commit dd907b8 into main May 11, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

COG overview generation poisoned by nodata sentinel

2 participants