Skip to content

Fix OverflowError on uint TIFF with negative GDAL_NODATA sentinel (#1581)#1583

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-11
May 11, 2026
Merged

Fix OverflowError on uint TIFF with negative GDAL_NODATA sentinel (#1581)#1583
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-11

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Reading a uint TIFF whose GDAL_NODATA is a negative sentinel (e.g. uint16 + -9999, common on legacy GDAL files) raised OverflowError on every backend. The nodata-mask code did arr.dtype.type(int(nodata)) with no range check.
  • Add a _int_nodata_in_range helper and gate the four cast sites: numpy eager, cupy _apply_nodata_mask_gpu, dask _delayed_read_window, and _resolve_masked_fill / _sparse_fill_value in _reader.py.
  • Out-of-range sentinels become a no-op for value matching (the file dtype cannot represent them anyway). attrs['nodata'] still carries the original sentinel so write round-trips keep the GDAL_NODATA tag intact. This matches rasterio's behavior.

Fixes #1581.

Test plan

  • New test_nodata_out_of_range_1581.py: helper, eager read, dask read, in-range non-regression, GPU helper (cupy-gated)
  • test_gpu_nodata_1542.py, test_nodata_no_extra_copy_1553.py, test_vrt_int_nodata_1564.py, test_lerc_valid_mask.py still pass
  • Reader / accessor / dtype-read suites still pass

)

Reading a uint TIFF whose GDAL_NODATA tag is a negative sentinel
(e.g. uint16 + -9999, common on legacy GDAL files) raised OverflowError
on every backend because the nodata-mask code did
arr.dtype.type(int(nodata)) with no range check.

Add _int_nodata_in_range helper in _reader.py and gate the four cast
sites (numpy eager open_geotiff, _apply_nodata_mask_gpu cupy path,
_delayed_read_window dask path, _resolve_masked_fill / _sparse_fill_value
in _reader.py). Out-of-range sentinels are treated as a no-op for value
matching since the file dtype cannot represent them; attrs['nodata']
still carries the original sentinel so write round-trips preserve the
GDAL_NODATA tag.

read_geotiff_dask no longer promotes file dtype to float64 when the
sentinel is unrepresentable, matching the materialized array.

Eight regression tests cover the helper, both eager and dask read
paths, the in-range non-regression case, and the GPU helper (cupy-gated).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 14:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes GeoTIFF reads that crash with OverflowError when an unsigned integer raster declares a negative GDAL_NODATA sentinel (e.g. uint16 + -9999) by range-checking before performing dtype casts during nodata masking / fill resolution. This aligns behavior across eager NumPy, Dask-windowed reads, GPU nodata masking, and LERC/sparse fill handling.

Changes:

  • Add _int_nodata_in_range and use it to guard integer nodata sentinel casts in _reader.py fill-resolution helpers.
  • Gate nodata masking casts in open_geotiff, _apply_nodata_mask_gpu, and the dask _delayed_read_window path to avoid OverflowError for out-of-range sentinels.
  • Add a regression test module covering helper behavior and eager/dask/GPU no-crash behavior for out-of-range integer sentinels.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
xrspatial/geotiff/__init__.py Adds range checks before integer nodata casts across eager, dask window reads, and GPU nodata masking to prevent OverflowError on out-of-range sentinels.
xrspatial/geotiff/_reader.py Introduces _int_nodata_in_range and uses it to guard dtype casts when resolving LERC masked fill and sparse-tile fill values.
xrspatial/geotiff/tests/test_nodata_out_of_range_1581.py New regression tests validating out-of-range nodata behavior across helper/eager/dask and GPU helper behavior.
.claude/sweep-accuracy-state.csv Updates internal audit tracking entry for the GeoTIFF accuracy sweep.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +128 to +132
cupy = pytest.importorskip('cupy')
from xrspatial.geotiff import _apply_nodata_mask_gpu

arr_gpu = cupy.array([[1, 2, 3], [4, 5, 6]], dtype=cupy.uint16)
out = _apply_nodata_mask_gpu(arr_gpu, -9999)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 3eb2528. I updated the GPU regression test to use a cupy+CUDA availability gate (_gpu_available + @_gpu_only), so it now skips cleanly when CuPy imports but no CUDA device is available.

@brendancol brendancol merged commit 7090718 into main May 11, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reading uint TIFF with negative GDAL_NODATA sentinel raises OverflowError

3 participants