Skip to content

to_geotiff(pack=True) silently wraps out-of-range values in the integer cast; Inf casts to garbage #3260

Description

@brendancol

Describe the bug

_pack (xrspatial/geotiff/_attrs.py) reverses the unpack transform with (data - add_offset) / scale_factor, rounds, and casts to the integer dtype recorded in attrs['mask_and_scale_dtype']. Nothing checks that the packed values actually fit that dtype, so two kinds of pixels get corrupted on disk with no error:

  • Finite values whose packed form falls outside the dtype range wrap in the astype. An int16 source with SCALE=0.1, one pixel edited to 4000.0 after the unpack read, packs to 40000 and lands on disk as -25536. Reopening with unpack=True returns -2553.6.
  • +/-Inf pixels pass the NaN guard (it checks isnan only) and cast to whatever the platform produces. On linux/x86 that is 0, with nothing but a numpy RuntimeWarning: invalid value encountered in cast from deep inside xarray.

All four backends are affected. The dask path defers the cast into the write's compute, so there is no warning at all at call time.

The function already fails closed for the neighbouring case: NaN pixels with no sentinel raise ValueError, and the comment there says the astype "would silently wrap it into a valid-looking integer" (_attrs.py around line 1875). Out-of-range finite values and Inf hit the same astype and wrap the same way, but have no guard.

Repro:

import numpy as np, xarray as xr
from xrspatial.geotiff import to_geotiff, open_geotiff

data = np.array([[100, 200], [300, 32000]], dtype=np.int16)
da = xr.DataArray(data, dims=('y','x'), coords={'y':[1.5,0.5],'x':[0.5,1.5]})
da.attrs['gdal_metadata'] = {'SCALE': '0.1', 'OFFSET': '0.0'}
to_geotiff(da, 'src_3259.tif')

mod = open_geotiff('src_3259.tif', unpack=True)
mod.values[0, 0] = 4000.0          # packs to 40000 > int16 max
to_geotiff(mod, 'packed_3259.tif', pack=True)

print(open_geotiff('packed_3259.tif').values)
# [[-25536    200]
#  [   300  32000]]

Expected behavior

Refuse to write pixels that cannot be represented in the packed dtype, the same way pack already refuses NaN with no sentinel: ValueError naming the packed dtype and its range. Eager input should raise at call time; dask input should raise from the write's single compute (same shape as the #3235 NaN guard). A raster writer silently changing pixel values is the worst failure mode it has.

Additional context

Found by /sweep-accuracy against the geotiff module. The unpack -> process -> pack workflow is the whole point of the feature, and processing routinely pushes values outside the source's packed range.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggeotiffGeoTIFF moduleseverity:highSweep finding: HIGHsweep-accuracyFound by /sweep-accuracy

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions