Skip to content

to_geotiff: support predictor=3 (floating-point predictor) on CPU write path #1313

@brendancol

Description

@brendancol

Summary

xrspatial.geotiff.to_geotiff accepts predictor: bool, which maps only to TIFF predictor 2 (horizontal differencing). TIFF predictor 3 (floating-point predictor) is not reachable from the CPU write path even though all the building blocks already exist in the codebase.

For float32 / float64 rasters (elevation, climate, model output), predictor=3 typically produces noticeably better deflate/zstd ratios than predictor=2 because it byte-swizzles before differencing. Code in the wild that interoperates with rasterio/GDAL routinely uses predictor=3 for float TIFFs, and the current API forces that workflow back to rasterio.

Current state

Public API at xrspatial/geotiff/__init__.py:402 exposes predictor: bool. The writer hardcodes:

# xrspatial/geotiff/_writer.py:597
pred_val = 2 if (predictor and compression != COMPRESSION_NONE) else 1

Same pattern at _writer.py:1169 for the GPU writer's tag emission.

What already exists

  • CPU encoder: fp_predictor_encode in xrspatial/geotiff/_compression.py:499
  • GPU encoder kernel: _fp_predictor_encode_kernel in xrspatial/geotiff/_gpu_decode.py:1688 (already wired into the GPU encode path at _gpu_decode.py:2260-2264)
  • CPU decoder: fp_predictor_decode (used at _reader.py:271)
  • GPU decoder: _fp_predictor_decode_kernel (used at _gpu_decode.py:1345, 1592)

So the read side and the GPU write side already round-trip predictor=3. The CPU writer just never calls fp_predictor_encode.

Proposed change

Widen the public arg to accept the predictor value directly:

predictor: bool | int = False
  • False / 0 → no predictor (current default)
  • True / 2 → horizontal differencing (current True behavior; preserved)
  • 3 → floating-point predictor; valid only for float dtypes

Inside the writers:

  • Branch on pred_val when calling the encoder (predictor_encode for 2, fp_predictor_encode for 3) at _writer.py:339, _writer.py:405, _writer.py:1040.
  • Emit the chosen pred_val in the TIFF tag at _writer.py:597 and _writer.py:1169 rather than forcing 2.
  • Validate: raise if predictor=3 is requested with an integer dtype.

Acceptance

  • New tests for CPU write + read round-trip of float32 and float64 data with predictor=3 under deflate and zstd.
  • predictor=3 + integer dtype raises a clear error.
  • Existing predictor=True tests stay green (semantics unchanged).
  • File written with predictor=3 is byte-readable by GDAL/rasterio (compare against a rasterio reference).

Out of scope

JPEG/LZW interactions; multi-band float predictor=3 is already validated on the read side (#1247) but new write tests should cover it explicitly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions