geotiff: golden corpus oracle harness (#1930, phase 1.2)#1991
Conversation
Adds xrspatial/geotiff/tests/golden_corpus/_oracle.py with compare_to_oracle(fixture_path, candidate_da, *, lossy=False). It opens the fixture with rasterio and asserts pixel / transform / CRS / nodata / dtype match the candidate. Float pixels use NaN-aware array_equal; CRS comparison falls back to EPSG-code equality when two WKTs describe the same coordinate system. There is a _assert_canonical_attrs hook in place but it is a no-op today. The canonical-attrs contract is xarray-contrib#1984 and hasn't settled, so the oracle only checks the obvious subset (crs / transform / nodata / dtype) for now. The TODO points at xarray-contrib#1984. lossy=True skips bit-exact pixel comparison for JPEG cells in Phase 2 and checks only shape, dtype, transform, and CRS. 16 unit tests cover the success path and every per-property failure mode (mismatched dtype, transform, CRS, NaN-vs-zero nodata, NaN-vs-zero pixels, EPSG-equivalent WKTs, lossy mode, missing fixture). The tests write tiny TIFFs into tmp_path with rasterio and build candidate DataArrays by hand, so they do not depend on the sibling Phase 1 PR 1 manifest / generator. No backends are wired yet (Phase 3). The manifest, generator, and fixture .tif files are not touched (Phase 1 PR 1 / Phase 2).
Self-reviewBlockersNone. Suggestions (worth addressing before this lands)
Nits
What I like about it
Checklist
|
…1991) Two fixes and two cleanups from the self-review on PR xarray-contrib#1991: 1. _assert_transform no longer treats every identity-equal ref transform as "no georef". A real raster written at origin (0, 0) with 1.0 pixel size also matches Affine.identity(), so the old short-circuit silently skipped the transform check on those files. The new _ref_has_georef helper requires a CRS *or* a non-identity transform before the comparison runs. 2. Dropped the ref_transform is None branch. rasterio.open(...).transform returns Affine.identity() for bare files, not None, so the check was dead. 3. Renamed _assert_canonical_attrs parameters to _ref_attrs / _candidate_da so the stub reads as deliberately-unused without the _ = ... silencing block. 4. Added a regression test (test_identity_transform_with_crs_still_compared) that fails on the old behaviour and passes on the new one, plus test_no_georef_fixture_tolerates_missing_candidate_transform covering the legitimate xrspatial no-georef path (xarray-contrib#1710). 18 tests now, all green. flake8 clean.
|
Fixes pushed in 6cdbb71:
Also added |
* geotiff: golden corpus phase 2.8, CRS variants (#1930) Phase 2 PR 8 of #1930. Three fixtures, one per CRS encoding the manifest allows: * crs_epsg_3857: EPSG-coded Web Mercator. Straight EPSG path. * crs_wkt_utm10n: WKT for EPSG:32610 with AUTHORITY blocks stripped, so the bytes on disk are not byte-identical to from_epsg(32610).to_wkt(). PROJ still resolves it to EPSG:32610, which is what the oracle's EPSG-code fallback (PR #1991) was built for. * crs_citation_only: GeoKey citation, no AUTHORITY tag, no EPSG. libgeotiff mutates the WKT on round-trip (axis order, UNIT AUTHORITY), and neither side has an EPSG code, so structural CRS.__eq__ and the EPSG fallback both fail. _crs_equal gets one extra branch: when both to_epsg() return None and structural equality fails, compare crs.to_dict() (PROJ form). That dict is stable across the round-trip. Smoke tests pin each fixture, including a negative test that EPSG:4326 (same proj family, different ellipsoid) is still rejected by the PROJ-dict path. Fixtures are 8x8 uint8; all three .tif files are under 600 bytes. * geotiff: address review on PR 2.8 CRS variants Self-review surfaced two issues with the PROJ-dict fallback added in the first commit: * ``CRS.to_dict()`` returns ``{}`` for LOCAL_CS-style WKTs that PROJ has no canonical form for. An unguarded ``ref.to_dict() == cand.to_dict()`` would treat any two such CRSes as equal, a silent false-positive in the oracle. Short-circuit on empty dicts. * ``CRS.to_dict()`` drops the GEOGCS / PROJCS name, so two citation-only CRSes with the same shape but different names would compare equal. Documented as a known limit; the current corpus only has one citation fixture so it is theoretical. If it ever bites, switch to a name-aware comparison via ``to_dict(projjson=True)`` with an axis-order normaliser. Also adds two tests: * ``test_crs_wkt_utm10n_fixture_accepts_wkt_attr``: complements the EPSG-int test by exercising the ``attrs['crs_wkt']`` branch of ``_candidate_crs``. Both paths must reach the same verdict. * ``test_crs_equal_rejects_empty_proj_dict``: regression pin for the empty-dict short-circuit. Uses two LOCAL_CS WKTs with different UNIT blocks so rasterio's own ``CRS.__eq__`` reports them as unequal and the test actually exercises the fallback rather than short-circuiting on structural equality.
Summary
Phase 1 PR 2 of #1930 (golden corpus for geotiff parity).
From the plan:
One public function:
Reads the fixture with rasterio and asserts, in order:
Affinevs xrspatialattrs['transform'], 1e-9 tol)coordinate system)
array_equal(..., equal_nan=True)for floats)lossy=Trueskips step 5 and checks shape only. That path is for theJPEG cells in Phase 2 PR 5.
Canonical-attrs hook (#1984)
The plan asks the oracle to assert "canonical-attrs match the candidate".
The canonical-attrs contract is tracked in #1984 and is not settled, so
this PR:
via the helpers listed above,
_assert_canonical_attrs(ref_attrs, candidate_da)hook so alater PR can extend the contract without touching call sites,
TODO(#1984)in that function.Out of scope
.tiffiles (Phase 2 PRs 3-9).gdal_metadata,extra_tags, etc.) --those wait on geotiff: define a public contract for DataArray attrs (canonical / alias / pass-through) #1984.
The oracle takes a raw filesystem path, not a manifest entry, so it
does not import anything from the sibling Phase 1 PR 1 work.
Test plan
pytest xrspatial/geotiff/tests/golden_corpus/test_oracle.py-- 16 tests, all greenmissing nodata / pixel / pixel-NaN-vs-zero / lossy-dtype /
lossy-transform / lossy-shape / missing-file