Describe the bug
_pack (xrspatial/geotiff/_attrs.py) reverses the unpack transform with (data - add_offset) / scale_factor, rounds, and casts to the integer dtype recorded in attrs['mask_and_scale_dtype']. Nothing checks that the packed values actually fit that dtype, so two kinds of pixels get corrupted on disk with no error:
- Finite values whose packed form falls outside the dtype range wrap in the
astype. An int16 source with SCALE=0.1, one pixel edited to 4000.0 after the unpack read, packs to 40000 and lands on disk as -25536. Reopening with unpack=True returns -2553.6.
- +/-Inf pixels pass the NaN guard (it checks
isnan only) and cast to whatever the platform produces. On linux/x86 that is 0, with nothing but a numpy RuntimeWarning: invalid value encountered in cast from deep inside xarray.
All four backends are affected. The dask path defers the cast into the write's compute, so there is no warning at all at call time.
The function already fails closed for the neighbouring case: NaN pixels with no sentinel raise ValueError, and the comment there says the astype "would silently wrap it into a valid-looking integer" (_attrs.py around line 1875). Out-of-range finite values and Inf hit the same astype and wrap the same way, but have no guard.
Repro:
import numpy as np, xarray as xr
from xrspatial.geotiff import to_geotiff, open_geotiff
data = np.array([[100, 200], [300, 32000]], dtype=np.int16)
da = xr.DataArray(data, dims=('y','x'), coords={'y':[1.5,0.5],'x':[0.5,1.5]})
da.attrs['gdal_metadata'] = {'SCALE': '0.1', 'OFFSET': '0.0'}
to_geotiff(da, 'src_3259.tif')
mod = open_geotiff('src_3259.tif', unpack=True)
mod.values[0, 0] = 4000.0 # packs to 40000 > int16 max
to_geotiff(mod, 'packed_3259.tif', pack=True)
print(open_geotiff('packed_3259.tif').values)
# [[-25536 200]
# [ 300 32000]]
Expected behavior
Refuse to write pixels that cannot be represented in the packed dtype, the same way pack already refuses NaN with no sentinel: ValueError naming the packed dtype and its range. Eager input should raise at call time; dask input should raise from the write's single compute (same shape as the #3235 NaN guard). A raster writer silently changing pixel values is the worst failure mode it has.
Additional context
Found by /sweep-accuracy against the geotiff module. The unpack -> process -> pack workflow is the whole point of the feature, and processing routinely pushes values outside the source's packed range.
Describe the bug
_pack(xrspatial/geotiff/_attrs.py) reverses the unpack transform with(data - add_offset) / scale_factor, rounds, and casts to the integer dtype recorded inattrs['mask_and_scale_dtype']. Nothing checks that the packed values actually fit that dtype, so two kinds of pixels get corrupted on disk with no error:astype. An int16 source with SCALE=0.1, one pixel edited to 4000.0 after the unpack read, packs to 40000 and lands on disk as -25536. Reopening withunpack=Truereturns -2553.6.isnanonly) and cast to whatever the platform produces. On linux/x86 that is 0, with nothing but a numpyRuntimeWarning: invalid value encountered in castfrom deep inside xarray.All four backends are affected. The dask path defers the cast into the write's compute, so there is no warning at all at call time.
The function already fails closed for the neighbouring case: NaN pixels with no sentinel raise ValueError, and the comment there says the astype "would silently wrap it into a valid-looking integer" (_attrs.py around line 1875). Out-of-range finite values and Inf hit the same astype and wrap the same way, but have no guard.
Repro:
Expected behavior
Refuse to write pixels that cannot be represented in the packed dtype, the same way pack already refuses NaN with no sentinel: ValueError naming the packed dtype and its range. Eager input should raise at call time; dask input should raise from the write's single compute (same shape as the #3235 NaN guard). A raster writer silently changing pixel values is the worst failure mode it has.
Additional context
Found by /sweep-accuracy against the geotiff module. The unpack -> process -> pack workflow is the whole point of the feature, and processing routinely pushes values outside the source's packed range.