Skip to content

Fix write_geotiff_gpu NaN-to-sentinel substitution (#1599)#1600

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-11-b
May 11, 2026
Merged

Fix write_geotiff_gpu NaN-to-sentinel substitution (#1599)#1600
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-11-b

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Closes #1599. write_geotiff_gpu (and to_geotiff(..., gpu=True))
emitted raw NaN bytes for missing pixels even when nodata=<finite>
was supplied, while the CPU writer substituted NaN with the sentinel
before encoding. xrspatial-only round-trips were unaffected because
the reader masks both NaN and the sentinel, but external readers that
mask only on the GDAL_NODATA tag (rasterio, GDAL, QGIS) treated NaN
pixels as valid data.

Reproducer from the issue:

CPU: valid=2975 / total=3000
GPU: valid=3000 / total=3000   <-- 25 NaN pixels not flagged as nodata

After the fix both rows read valid=2975 / total=3000.

Fix

Mirror the CPU writer's NaN-to-sentinel rewrite on the CuPy array
before compression. Gate on float dtype + finite nodata. Copy
defensively before mutating so a caller-owned CuPy buffer is not
modified, matching the CPU writer's arr.copy() at the equivalent
step.

Tests

test_gpu_writer_nan_sentinel_1599.py adds 7 regression tests:

  • Sentinel lands in file bytes where input held NaN
  • CPU and GPU writers produce byte-equivalent files
  • Caller's CuPy buffer is not mutated
  • No-NaN input round-trips bit-exact
  • NaN nodata sentinel skips substitution (CPU/GPU agree)
  • rasterio sees the same valid-pixel count on CPU and GPU files
  • 3D multiband path handles the substitution

Test plan

  • New regression tests pass (7/7)
  • Existing GPU writer tests still pass (50/50 across band-first, attrs, nodata, dask+cupy, writer, nodata aliases)
  • rasterio round-trip parity verified
  • CuPy caller buffer preservation verified

Discovered during the 2026-05-11 geotiff accuracy sweep (pass 13).

The GPU writer (write_geotiff_gpu / to_geotiff with gpu=True) emitted
raw NaN bytes for missing pixels even when nodata=<finite> was
supplied, while the CPU writer substituted NaN with the sentinel
before encoding. xrspatial-only round-trips were unaffected because
the reader masks both NaN and the sentinel, but external readers that
mask only on the GDAL_NODATA tag (rasterio, GDAL, QGIS) treated NaN
pixels as valid data. rasterio reported 100% valid pixels on a GPU
file with 25 NaN inputs vs the CPU file's 25-invalid count.

Mirror the CPU writer's NaN-to-sentinel rewrite on the CuPy array
before compression. Gate on float dtype and finite nodata. Copy
defensively before mutating so a caller-owned CuPy buffer is not
modified, matching the CPU writer's arr.copy() at the equivalent step.

Add test_gpu_writer_nan_sentinel_1599.py: 7 regression tests covering
sentinel substitution, CPU/GPU byte equivalence, caller buffer
preservation, no-NaN no-op, NaN sentinel skip, rasterio-visible mask
parity, and 3D multiband substitution.

Discovered during the 2026-05-11 geotiff accuracy sweep.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 18:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a GeoTIFF GPU-writer behavioral divergence where write_geotiff_gpu/to_geotiff(..., gpu=True) could write raw NaN float bytes even when a finite nodata sentinel was requested, causing external GDAL-based readers (e.g., rasterio/QGIS) to treat those pixels as valid data.

Changes:

  • Add a GPU-side NaN→sentinel substitution step (float dtype + finite nodata) before GPU compression to match the CPU writer’s behavior.
  • Add a regression test module covering substitution correctness, CPU/GPU parity, caller-buffer non-mutation, and an external-reader mask check.
  • Update the accuracy sweep tracking CSV entry for the GeoTIFF pass.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/__init__.py Adds NaN-to-sentinel rewrite logic in write_geotiff_gpu prior to compression.
xrspatial/geotiff/tests/test_gpu_writer_nan_sentinel_1599.py Introduces regression tests for GPU writer NaN/sentinel behavior and CPU/GPU parity.
.claude/sweep-accuracy-state.csv Records the pass-13 accuracy sweep status and notes for issue #1599.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/__init__.py Outdated
Comment on lines +2594 to +2609
# paths must produce byte-equivalent files for the same input. The
# rewrite is in-place on the GPU array; ``arr`` is either a fresh
# ``cupy.asarray`` copy of caller data (numpy/dask inputs) or the
# caller-owned CuPy array. In the latter case we copy once before
# mutating to keep parity with the CPU writer's defensive copy
# semantics around in-place sentinel writes on user-owned buffers.
if (nodata is not None
and np_dtype.kind == 'f'
and not np.isnan(float(nodata))):
nan_mask = cupy.isnan(arr)
if bool(nan_mask.any()):
# When ``arr`` is the caller's CuPy buffer (came in as
# ``data.data`` on a DataArray that holds a CuPy array), an
# in-place rewrite would mutate the user's array. Copy
# first; the CPU writer takes the same defensive copy via
# ``arr.copy()`` at the matching line.
Comment on lines +174 to +175
to_geotiff(da_cpu, p_cpu, crs=4326, nodata=-9999)
write_geotiff_gpu(da_gpu, p_gpu, crs=4326, nodata=-9999)
…r test

- Update the in-code comment around the GPU NaN-to-sentinel rewrite to
  reflect the actual unconditional-copy behavior. The previous comment
  implied a caller-owned/fresh-buffer split that the code did not
  enforce; spell out instead why we copy in every case rather than
  tracking provenance through the upstream branch tree.
- Pin compression='deflate' on both the CPU and GPU writers in the
  external-reader (rasterio) regression test. The default codec is
  ZSTD, and some rasterio/GDAL builds in the wild ship without ZSTD
  support, which would have failed the round-trip for environment
  reasons unrelated to the nodata mask under test.
@brendancol brendancol merged commit 950317e into main May 11, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

write_geotiff_gpu omits NaN-to-sentinel substitution, breaking nodata for external readers

2 participants