Skip to content

Refactor GeoTIFF Phase 2: centralize transform/georef contract#2233

Merged
brendancol merged 2 commits into
mainfrom
issue-2225-georef-resolver
May 21, 2026
Merged

Refactor GeoTIFF Phase 2: centralize transform/georef contract#2233
brendancol merged 2 commits into
mainfrom
issue-2225-georef-resolver

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2225
Part of #2211

Summary

Centralises the transform/georef contract behind a single resolver so every backend (eager, GPU, dask, VRT, read and write) makes the same decision for matching inputs.

  • Adds GeorefResolution dataclass and resolve_georef() in _coords.py. Fields: transform, georef_status, dropped_reason, applied_no_georef_marker. The resolver owns coord-to-transform inference, the no-georef marker, degenerate-axis policy, and rotated-affine drop policy.
  • Routes _attrs._compute_georef_status through the resolver so the read-side bucket decision is computed in one place.
  • Replaces the inline _transform_from_attr -> _coords_to_transform -> _require_transform_for_georeferenced ladder in _writers/eager.py (two sites: to_geotiff and _write_vrt_tiled) and _writers/gpu.py with one resolve_georef(data) call.
  • The synthesised GeoInfo from _backends/vrt.py::_vrt_to_synthetic_geo_info already flowed through _compute_georef_status, which now routes through the resolver. Added a docstring note.

The public API is unchanged. georef_status values, attrs['transform'] precedence, and the fail-closed behaviour on degenerate coords (#1945) all match prior behaviour.

Tests

New file xrspatial/geotiff/tests/test_georef_resolver_parity_2211.py (22 tests) covers the six fixture cases the issue calls out plus cross-site parity:

  • y/x, lat/lon, latitude/longitude dim names
  • row/col (no-georef placeholder + _NO_GEOREF_KEY marker)
  • transform-only via attrs['transform']
  • CRS-only via attrs['crs']
  • rotated-affine GeoInfo -> rotated_dropped bucket
  • writer-and-reader-agree assertions on full and no-georef cases

Verification

  • pytest xrspatial/geotiff/tests/ -> 4,729 passed, 45 skipped, 1 xfailed (pre-existing lz4 failure is unrelated and reproduces on main).
  • test_georef_status_2136.py and test_allow_rotated_* suites: all green.

Notes

  • xrspatial/geotiff/_writers/vrt.py is the public write_vrt mosaic creator. It does not handle DataArrays or build transforms, so the resolver does not apply there. The per-tile VRT write path lives in _writers/eager.py::_write_vrt_tiled and was updated.
  • Kept _compute_georef_status_from_parts (VRT inline shim) as-is so VRT branches do not have to build a fake GeoInfo to feed the resolver; its decision table mirrors the resolver's reader-path branch.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 21, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Refactor GeoTIFF Phase 2: centralize transform/georef contract

Blockers (must fix before merge)

  • None. Behavior is preserved and the full geotiff suite (4,729 tests) passes.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_coords.py:723-728 -- The writer path bucketed any "transform present + no CRS" input as coords, which masked the distinction between transform_only (transform came from attrs['transform']) and coords (transform was derived from coord arrays). The resolver now tracks the source and picks the right bucket. Addressed in a follow-up commit on this PR.
  • xrspatial/geotiff/_coords.py:708-720 -- The writer path's CRS lookup ignored string-valued attrs['crs']. Some pipelines stash a WKT string there, and attrs_to_metadata already folds it into crs_wkt. The resolver was missing the same fold, so a CRS-only DataArray with a string crs attr would have been classified as none. Same follow-up commit fixed it.

Nits (optional improvements)

  • xrspatial/geotiff/_coords.py:411-489 -- The bare-input branch (no DataArray, no GeoInfo) is convenient for tests but no production call site uses it today. Worth a docstring line noting that it is a fixture-only entry point.
  • xrspatial/geotiff/_attrs.py:885 -- The thin _compute_georef_status delegation could be inlined once all callers have moved to resolve_georef. Leaving the shim is fine for this PR; the _* name stays stable for the other in-flight PRs in #2211.

What looks good

  • GeorefResolution is a frozen dataclass with Literal-typed georef_status, so callers get type narrowing for free.
  • The dropped_reason strings (rotated_affine_dropped, no_georef_marker, no_transform_inferred, no_inputs) make failures easy to read without diving into the resolver source.
  • 22 new parametrized tests cover the six fixture cases the issue called out plus cross-site parity assertions.
  • The unused _transform_from_attr / _coords_to_transform imports in the two writer modules are gone, so resolve_georef is the only path.

Checklist

  • Decision table mirrors _compute_georef_status with the writer/reader split documented.
  • Backend parity: eager numpy, dask+numpy, GPU eager, GPU dask, and VRT (eager + chunked) all route through the resolver.
  • No change to nodata or masking paths.
  • Edge cases (1x1, 1xN, degenerate axis, _NO_GEOREF_KEY, rotated affine) all exercised.
  • No chunk-boundary code touched.
  • Writer paths still read attrs only; nothing materialises.
  • [n/a] Benchmarks: pure refactor.
  • [n/a] README feature matrix: no public API change.
  • Docstrings on GeorefResolution, resolve_georef, and the rerouted _compute_georef_status.

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (follow-up): Refactor GeoTIFF Phase 2

Re-reviewed after commit 3f0eddc.

Blockers

  • None.

Suggestions

  • None outstanding. The two suggestions from the first review (writer-side bucket split and string-attrs['crs'] fold) are addressed.

Nits (left for follow-up, with reasons)

  • xrspatial/geotiff/_coords.py -- The bare-input branch (no DataArray, no GeoInfo) is still fixture-only. Not addressed here because adding a "fixture-only" disclaimer to a public-looking helper is a doc-only change that competes for diff with the other in-flight PRs in #2211. Worth a follow-up.
  • xrspatial/geotiff/_attrs.py -- The thin _compute_georef_status shim could be inlined once #2226 / #2227 / #2229 land. Keeping the name now avoids rebase friction with those siblings.

What still looks good

  • transform_source is tracked locally and only sets the bucket; no side-effects.
  • The string-crs -> crs_wkt fold matches the attrs_to_metadata precedence exactly, so a CRS-only DataArray with a string attrs['crs'] no longer mis-classifies as none.
  • Parity test was updated to assert the new transform_only bucket on a transform-only DataArray.
  • All 77 tests in the focused suite (parity, georef_status, allow_rotated) pass; the broader 4,729-test geotiff suite still passes too.

Disposition

Cleared for the next stage of #2211. No remaining actionable findings in this PR.

…2225)

PR-B of #2211. Adds GeorefResolution dataclass and resolve_georef()
to _coords.py. Routes _attrs._compute_georef_status through the
resolver and replaces the inline transform_from_attr -> coords_to_transform
ladder in the eager, GPU, and per-tile VRT writers with a single
resolver call. Adds parametrized parity tests across y/x, lat/lon,
row/col, transform-only, crs-only, and rotated-dropped fixtures.
- Distinguish ``transform_only`` (from attrs['transform']) from
  ``coords`` (derived from coord arrays). The writer path collapsed
  both into ``coords`` originally, which masked the documented
  bucket split.
- Fold string-valued ``attrs['crs']`` into ``crs_wkt`` so the
  resolver matches the ``attrs_to_metadata`` precedence. Without
  this a CRS-only DataArray with a string ``crs`` attr was
  classified as ``none``.
- Update the parity test to assert the new ``transform_only``
  bucket on a DataArray with only ``attrs['transform']`` set.
@brendancol brendancol force-pushed the issue-2225-georef-resolver branch from 3f0eddc to e7436d5 Compare May 21, 2026 04:17
@brendancol brendancol merged commit ab5480e into main May 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor GeoTIFF Phase 2: centralize transform/georef contract (PR-B of #2211)

1 participant