Summary
The reader accepts a malformed IFD that claims Predictor=3 (Floating-Point Predictor, TIFF Technical Note 3) paired with an integer SampleFormat (1=uint or 2=int) without complaint. _apply_predictor in xrspatial/geotiff/_reader.py routes to fp_predictor_decode solely on the predictor tag, with no check that the data is actually float. The byte-swizzle unshuffle then runs on integer bytes, producing garbage pixel values that look like valid integers.
The writer side already rejects this combination: _writer.py _resolve_predictor raises ValueError(\"predictor=3 (floating-point) requires float data, got dtype=...\") when called with a non-float dtype. xrspatial-written files cannot hit this asymmetry, but external or adversarial files can.
Reproduction
A synthetic IFD with BitsPerSample=32, SampleFormat=1 (UINT), Predictor=3, and uncompressed strip data round-trips through open_geotiff without raising, and the returned uint32 array does not match the bytes that were written. The bug is silent: no warning, no exception, and a downstream consumer has no signal that the decoded data is wrong.
Fix
Reject Predictor=3 when SampleFormat is not 3 (float) on read. Either:
- Raise
ValueError (mirror the writer message: "predictor=3 (floating-point) requires float data, got SampleFormat="), or
- Warn and fall back to
predictor=1 (no predictor).
Option (1) is cleaner and matches the writer's contract. The check belongs in _apply_predictor (or one step earlier where the IFD is first validated) so every backend path (eager numpy, dask, GPU) picks it up via the shared routine.
Regression test
A unit test in xrspatial/geotiff/tests/ that builds a small in-memory TIFF with the bad combination and asserts the new exception.
Summary
The reader accepts a malformed IFD that claims
Predictor=3(Floating-Point Predictor, TIFF Technical Note 3) paired with an integerSampleFormat(1=uint or 2=int) without complaint._apply_predictorinxrspatial/geotiff/_reader.pyroutes tofp_predictor_decodesolely on the predictor tag, with no check that the data is actually float. The byte-swizzle unshuffle then runs on integer bytes, producing garbage pixel values that look like valid integers.The writer side already rejects this combination:
_writer.py_resolve_predictorraisesValueError(\"predictor=3 (floating-point) requires float data, got dtype=...\")when called with a non-float dtype. xrspatial-written files cannot hit this asymmetry, but external or adversarial files can.Reproduction
A synthetic IFD with
BitsPerSample=32,SampleFormat=1(UINT),Predictor=3, and uncompressed strip data round-trips throughopen_geotiffwithout raising, and the returned uint32 array does not match the bytes that were written. The bug is silent: no warning, no exception, and a downstream consumer has no signal that the decoded data is wrong.Fix
Reject
Predictor=3whenSampleFormatis not3(float) on read. Either:ValueError(mirror the writer message: "predictor=3 (floating-point) requires float data, got SampleFormat="), orpredictor=1(no predictor).Option (1) is cleaner and matches the writer's contract. The check belongs in
_apply_predictor(or one step earlier where the IFD is first validated) so every backend path (eager numpy, dask, GPU) picks it up via the shared routine.Regression test
A unit test in
xrspatial/geotiff/tests/that builds a small in-memory TIFF with the bad combination and asserts the new exception.