Skip to content

geotiff: add Hypothesis property tests for metadata round trips #2134

@brendancol

Description

@brendancol

Summary

Add Hypothesis-driven property tests for the GeoTIFF write/read round trip.

Round-trip coverage today is a long list of incident-specific files (test_metadata_round_trip_1484.py, test_descending_coords_1716.py, test_no_georef_writer_round_trip_1949.py, test_int_coords_round_trip_hotfix_1962.py, test_round_trip_invariants.py) plus one fuzz file (test_fuzz_hypothesis_1661.py) that covers dtype/codec/predictor but skips the metadata axes. The metadata axes (coord dtype, axis direction, degenerate shape, CRS/transform presence, nodata encoding, band axis position) are where most of the recent round-trip bugs have come from.

Why property tests instead of more example tests

  • Combinatorial blowup. Six axes with 3-4 values each is roughly 500-1500 combinations. Not feasible by hand.
  • Bugs in the existing 200+ geotiff test files cluster around interactions between two axes (integer coords with degenerate shape, descending y with rotated transform, float nodata with int dtype). Single-axis example tests miss those.
  • Hypothesis shrinking gives a minimal failing case for free when a future writer change breaks one of the corners.

Property space

One strategy per axis, then a composite strategy that draws one value from each.

Axis Strategy
coord dtype sampled_from(['int32', 'int64', 'float32', 'float64'])
axis direction sampled_from(['asc_asc', 'asc_desc', 'desc_asc', 'desc_desc'])
shape sampled_from([(1, 1), (1, 8), (8, 1), (4, 5), (16, 16)])
georef presence sampled_from(['crs_only', 'transform_only', 'both', 'neither'])
nodata mode sampled_from(['in_range', 'out_of_range', 'fractional', 'nan', 'none'])
band layout sampled_from(['band_first', 'band_last', 'no_band'])
pixel dtype sampled_from(['uint8', 'int16', 'int32', 'float32', 'float64'])
CRS sampled_from([4326, 3857, 32633, 26910, None]) (None paired with crs_only=False)

Per-draw filtering for illegal combinations (e.g. nodata=nan with int dtype must promote to float on read, which is documented behaviour, not a failure).

Round-trip invariant

After two cycles:

da0  -> write(p1) -> read(p1) = da1
da1  -> write(p2) -> read(p2) = da2

assert da1 == da2 under semantic equality:

  • data: np.array_equal with NaN-aware compare
  • dtype: identical
  • dims: identical (including band axis position)
  • coords: per-axis np.allclose for float, np.array_equal for int; direction preserved
  • attrs['crs']: same int EPSG, or both absent
  • attrs['transform']: same GeoTransform tuple to 1e-9 relative tolerance, or both flagged no-georef via the marker from geotiff: to_geotiff silently strips georef on int64 step-1 user coords #2120
  • attrs['nodata']: same value, or both absent; NaN compares equal to NaN

Fixed-point (da1 == da2) works because the writer is deterministic given the same input attrs. One full cycle is enough to surface drift.

File location

xrspatial/geotiff/tests/test_roundtrip_properties.py

Skip the module if hypothesis is not installed, same pattern as test_fuzz_hypothesis_1661.py.

Hypothesis profile

  • Local: settings(max_examples=200, deadline=None, suppress_health_check=[HealthCheck.too_slow])
  • CI profile registered as ci with max_examples=50 and derandomize=True for reproducibility
  • Seed printed on failure (Hypothesis default)

Backend scope

numpy and dask+numpy for the first pass. cupy and dask+cupy share the same writer/reader code but need a CUDA runner, so leave those for a follow-up once the numpy invariants are pinned.

Out of scope

  • Byte-for-byte file equality. The writer is allowed to reorder IFD tags, change strip layout, etc. The test_golden_corpus_*.py files cover byte stability where it matters.
  • Performance or timing assertions.
  • VRT, COG, overviews. Those have their own round-trip suites.
  • GPU code paths.

Tie-in

This file is meant to back-stop several adjacent contracts. Cross-reference them in the docstring:

Dev dependency

hypothesis is already imported by test_fuzz_hypothesis_1661.py via pytest.importorskip. It is not declared in pyproject.toml or setup.py. Adding it to a test extras group is a small follow-up but not blocking. The new file uses the same importorskip guard.

Acceptance

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions