Summary
xrspatial.geotiff.to_geotiff accepts predictor: bool, which maps only to TIFF predictor 2 (horizontal differencing). TIFF predictor 3 (floating-point predictor) is not reachable from the CPU write path even though all the building blocks already exist in the codebase.
For float32 / float64 rasters (elevation, climate, model output), predictor=3 typically produces noticeably better deflate/zstd ratios than predictor=2 because it byte-swizzles before differencing. Code in the wild that interoperates with rasterio/GDAL routinely uses predictor=3 for float TIFFs, and the current API forces that workflow back to rasterio.
Current state
Public API at xrspatial/geotiff/__init__.py:402 exposes predictor: bool. The writer hardcodes:
# xrspatial/geotiff/_writer.py:597
pred_val = 2 if (predictor and compression != COMPRESSION_NONE) else 1
Same pattern at _writer.py:1169 for the GPU writer's tag emission.
What already exists
- CPU encoder:
fp_predictor_encode in xrspatial/geotiff/_compression.py:499
- GPU encoder kernel:
_fp_predictor_encode_kernel in xrspatial/geotiff/_gpu_decode.py:1688 (already wired into the GPU encode path at _gpu_decode.py:2260-2264)
- CPU decoder:
fp_predictor_decode (used at _reader.py:271)
- GPU decoder:
_fp_predictor_decode_kernel (used at _gpu_decode.py:1345, 1592)
So the read side and the GPU write side already round-trip predictor=3. The CPU writer just never calls fp_predictor_encode.
Proposed change
Widen the public arg to accept the predictor value directly:
predictor: bool | int = False
False / 0 → no predictor (current default)
True / 2 → horizontal differencing (current True behavior; preserved)
3 → floating-point predictor; valid only for float dtypes
Inside the writers:
- Branch on
pred_val when calling the encoder (predictor_encode for 2, fp_predictor_encode for 3) at _writer.py:339, _writer.py:405, _writer.py:1040.
- Emit the chosen
pred_val in the TIFF tag at _writer.py:597 and _writer.py:1169 rather than forcing 2.
- Validate: raise if
predictor=3 is requested with an integer dtype.
Acceptance
- New tests for CPU write + read round-trip of float32 and float64 data with
predictor=3 under deflate and zstd.
predictor=3 + integer dtype raises a clear error.
- Existing
predictor=True tests stay green (semantics unchanged).
- File written with
predictor=3 is byte-readable by GDAL/rasterio (compare against a rasterio reference).
Out of scope
JPEG/LZW interactions; multi-band float predictor=3 is already validated on the read side (#1247) but new write tests should cover it explicitly.
Summary
xrspatial.geotiff.to_geotiffacceptspredictor: bool, which maps only to TIFF predictor 2 (horizontal differencing). TIFF predictor 3 (floating-point predictor) is not reachable from the CPU write path even though all the building blocks already exist in the codebase.For float32 / float64 rasters (elevation, climate, model output), predictor=3 typically produces noticeably better deflate/zstd ratios than predictor=2 because it byte-swizzles before differencing. Code in the wild that interoperates with rasterio/GDAL routinely uses
predictor=3for float TIFFs, and the current API forces that workflow back to rasterio.Current state
Public API at
xrspatial/geotiff/__init__.py:402exposespredictor: bool. The writer hardcodes:Same pattern at
_writer.py:1169for the GPU writer's tag emission.What already exists
fp_predictor_encodeinxrspatial/geotiff/_compression.py:499_fp_predictor_encode_kernelinxrspatial/geotiff/_gpu_decode.py:1688(already wired into the GPU encode path at_gpu_decode.py:2260-2264)fp_predictor_decode(used at_reader.py:271)_fp_predictor_decode_kernel(used at_gpu_decode.py:1345,1592)So the read side and the GPU write side already round-trip predictor=3. The CPU writer just never calls
fp_predictor_encode.Proposed change
Widen the public arg to accept the predictor value directly:
False/0→ no predictor (current default)True/2→ horizontal differencing (currentTruebehavior; preserved)3→ floating-point predictor; valid only for float dtypesInside the writers:
pred_valwhen calling the encoder (predictor_encodefor 2,fp_predictor_encodefor 3) at_writer.py:339,_writer.py:405,_writer.py:1040.pred_valin the TIFF tag at_writer.py:597and_writer.py:1169rather than forcing 2.predictor=3is requested with an integer dtype.Acceptance
predictor=3underdeflateandzstd.predictor=3+ integer dtype raises a clear error.predictor=Truetests stay green (semantics unchanged).predictor=3is byte-readable by GDAL/rasterio (compare against a rasterio reference).Out of scope
JPEG/LZW interactions; multi-band float predictor=3 is already validated on the read side (#1247) but new write tests should cover it explicitly.