Skip to content

Apply nodata mask in read_geotiff_gpu (#1542)#1547

Merged
brendancol merged 3 commits intomainfrom
issue-1542
May 11, 2026
Merged

Apply nodata mask in read_geotiff_gpu (#1542)#1547
brendancol merged 3 commits intomainfrom
issue-1542

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Fixes #1542. read_geotiff_gpu (used when open_geotiff(path, gpu=True) is called) silently differed from the CPU and dask paths on rasters with a declared nodata sentinel: integer rasters kept the sentinel literal in a uint16 array, float rasters kept the sentinel rather than NaN, and attrs['nodata'] was never set. NaN-aware code that worked on the CPU and dask paths quietly produced wrong results on the GPU path.

The fix adds an _apply_nodata_mask_gpu helper that mirrors the CPU eager masking logic using cupy operations. Both the tiled main path and the stripped fallback inside read_geotiff_gpu now promote integer rasters to float64 with NaN at sentinel positions, replace finite sentinels in float arrays with NaN, and set attrs['nodata'] so the original value stays visible.

Two existing GPU tests had pinned the old behaviour and needed adjustment. test_sparse_tile_gpu_round_trip asserted dtype=uint16; it now expects float64 and NaN at sparse positions. test_*_sentinel_nodata in test_lerc_valid_mask_gpu compared read_geotiff_gpu against read_to_array (low-level reader); the helpers now restore the sentinel on the GPU side before the bit-for-bit comparison so the LERC mask preservation check still holds.

Test plan

  • New test_gpu_nodata_1542.py covers tiled + stripped paths, float32 and uint16 sentinels, signed int negative sentinel, NaN nodata, no-nodata-attr, and a four-backend agreement check across numpy, dask+numpy, cupy, and dask+cupy.
  • Updated LERC GPU and sparse COG GPU tests pass with the new behaviour.
  • Full geotiff test suite passes (943 passed, 6 skipped, 3 deselected unrelated matplotlib failures).
  • Wider xrspatial suite shows no new regressions from this change. One pre-existing balanced_allocation dask+cupy failure is unrelated to geotiff.

The GPU read backend silently differed from the CPU and dask paths
when the file declared a nodata sentinel. open_geotiff(path, gpu=True)
returned a DataArray whose attrs had no 'nodata' key and whose pixel
data still carried the raw sentinel value. Integer rasters were not
promoted to float64, and float rasters kept the sentinel rather than
NaN. NaN-aware code that worked on the CPU and dask paths quietly
produced wrong results on the GPU path.

Add an _apply_nodata_mask_gpu helper that mirrors the CPU masking
logic with cupy operations, and call it from both the tiled main
path and the stripped fallback inside read_geotiff_gpu. Also set
attrs['nodata'] from geo_info.nodata so callers can still see the
sentinel value.

Two existing tests had codified the old behaviour and needed
updates: test_sparse_tile_gpu_round_trip checked dtype=uint16, but
the GPU read now promotes to float64 to represent NaN, matching the
CPU path. test_*_sentinel_nodata in test_lerc_valid_mask_gpu
compared read_geotiff_gpu against read_to_array (low-level), so the
test helpers now restore the sentinel on the GPU side before the
bit-for-bit comparison so the LERC mask preservation check still
holds.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes backend parity for GeoTIFF GPU reads by making read_geotiff_gpu apply the same nodata masking and attrs['nodata'] propagation as the CPU eager and dask paths (fixes #1542), preventing silent incorrect results in NaN-aware downstream code when open_geotiff(..., gpu=True) is used.

Changes:

  • Added a CuPy-based _apply_nodata_mask_gpu helper and applied it in both the tiled GPU path and the stripped-file fallback inside read_geotiff_gpu.
  • Updated existing GPU tests that previously pinned the old “raw sentinel” behavior to now expect NaN-masked output and attrs['nodata'].
  • Added a new regression test module covering multiple nodata/dtype/layout cases and a 4-backend agreement check (numpy, dask+numpy, cupy, dask+cupy).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/__init__.py Adds _apply_nodata_mask_gpu and applies nodata masking + attrs['nodata'] propagation in read_geotiff_gpu for stripped and tiled paths.
xrspatial/geotiff/tests/test_sparse_cog.py Updates sparse COG GPU test expectations to match new nodata→NaN masking and dtype promotion behavior.
xrspatial/geotiff/tests/test_lerc_valid_mask_gpu.py Adjusts LERC mask preservation tests to account for GPU nodata masking by restoring sentinels for bit-exact comparisons.
xrspatial/geotiff/tests/test_gpu_nodata_1542.py Adds new regression coverage for nodata parity across GPU/CPU and dask/non-dask paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +89 to +92
The GPU reader (``read_geotiff_gpu``) applies the same nodata masking
that ``open_geotiff`` does (PR #1542), so its output uses NaN where
the sentinel was. Callers that want a bit-for-bit comparison should
pass ``raw_gpu=True`` to skip the high-level masking on the GPU side.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring rewritten to describe what the file actually does. The bit-exact-comparison path is _restore_sentinel on the caller side (line 103 below); the raw_gpu=True mention was aspirational from a draft. 54606a2.

Comment thread xrspatial/geotiff/tests/test_gpu_nodata_1542.py Outdated
Conflict in xrspatial/geotiff/__init__.py: both #1546 (orientation
handling on GPU read) and this PR (#1547 nodata mask) added a new block
between the multi-band shape check and the dtype cast. They are
independent, so kept both. Order matches the stripped early-return path
above: orientation first (so the geo_info transform reflects the post-
flip pixel layout), then nodata mask (so the value semantics agree with
the CPU read).
- Drop unused os/tempfile imports in test_gpu_nodata_1542 (tmp_path
  fixture covers the file paths).
- Rewrite the _read_cpu_gpu docstring: the raw_gpu=True hook does not
  exist; callers run the GPU result through _restore_sentinel for the
  bit-exact comparison.
@brendancol brendancol merged commit 118678a into main May 11, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

read_geotiff_gpu skips nodata masking and drops attrs['nodata']

2 participants