Refactor GeoTIFF Phase 2: centralize transform/georef contract#2233
Merged
Conversation
brendancol
commented
May 21, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Refactor GeoTIFF Phase 2: centralize transform/georef contract
Blockers (must fix before merge)
- None. Behavior is preserved and the full geotiff suite (4,729 tests) passes.
Suggestions (should fix, not blocking)
xrspatial/geotiff/_coords.py:723-728-- The writer path bucketed any "transform present + no CRS" input ascoords, which masked the distinction betweentransform_only(transform came fromattrs['transform']) andcoords(transform was derived from coord arrays). The resolver now tracks the source and picks the right bucket. Addressed in a follow-up commit on this PR.xrspatial/geotiff/_coords.py:708-720-- The writer path's CRS lookup ignored string-valuedattrs['crs']. Some pipelines stash a WKT string there, andattrs_to_metadataalready folds it intocrs_wkt. The resolver was missing the same fold, so a CRS-only DataArray with a stringcrsattr would have been classified asnone. Same follow-up commit fixed it.
Nits (optional improvements)
xrspatial/geotiff/_coords.py:411-489-- The bare-input branch (no DataArray, no GeoInfo) is convenient for tests but no production call site uses it today. Worth a docstring line noting that it is a fixture-only entry point.xrspatial/geotiff/_attrs.py:885-- The thin_compute_georef_statusdelegation could be inlined once all callers have moved toresolve_georef. Leaving the shim is fine for this PR; the_*name stays stable for the other in-flight PRs in #2211.
What looks good
GeorefResolutionis a frozen dataclass withLiteral-typedgeoref_status, so callers get type narrowing for free.- The
dropped_reasonstrings (rotated_affine_dropped,no_georef_marker,no_transform_inferred,no_inputs) make failures easy to read without diving into the resolver source. - 22 new parametrized tests cover the six fixture cases the issue called out plus cross-site parity assertions.
- The unused
_transform_from_attr/_coords_to_transformimports in the two writer modules are gone, soresolve_georefis the only path.
Checklist
- Decision table mirrors
_compute_georef_statuswith the writer/reader split documented. - Backend parity: eager numpy, dask+numpy, GPU eager, GPU dask, and VRT (eager + chunked) all route through the resolver.
- No change to nodata or masking paths.
- Edge cases (1x1, 1xN, degenerate axis,
_NO_GEOREF_KEY, rotated affine) all exercised. - No chunk-boundary code touched.
- Writer paths still read attrs only; nothing materialises.
- [n/a] Benchmarks: pure refactor.
- [n/a] README feature matrix: no public API change.
- Docstrings on
GeorefResolution,resolve_georef, and the rerouted_compute_georef_status.
brendancol
commented
May 21, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review (follow-up): Refactor GeoTIFF Phase 2
Re-reviewed after commit 3f0eddc.
Blockers
- None.
Suggestions
- None outstanding. The two suggestions from the first review (writer-side bucket split and string-
attrs['crs']fold) are addressed.
Nits (left for follow-up, with reasons)
xrspatial/geotiff/_coords.py-- The bare-input branch (no DataArray, no GeoInfo) is still fixture-only. Not addressed here because adding a "fixture-only" disclaimer to a public-looking helper is a doc-only change that competes for diff with the other in-flight PRs in #2211. Worth a follow-up.xrspatial/geotiff/_attrs.py-- The thin_compute_georef_statusshim could be inlined once #2226 / #2227 / #2229 land. Keeping the name now avoids rebase friction with those siblings.
What still looks good
transform_sourceis tracked locally and only sets the bucket; no side-effects.- The string-
crs->crs_wktfold matches theattrs_to_metadataprecedence exactly, so a CRS-only DataArray with a stringattrs['crs']no longer mis-classifies asnone. - Parity test was updated to assert the new
transform_onlybucket on a transform-only DataArray. - All 77 tests in the focused suite (parity, georef_status, allow_rotated) pass; the broader 4,729-test geotiff suite still passes too.
Disposition
Cleared for the next stage of #2211. No remaining actionable findings in this PR.
…2225) PR-B of #2211. Adds GeorefResolution dataclass and resolve_georef() to _coords.py. Routes _attrs._compute_georef_status through the resolver and replaces the inline transform_from_attr -> coords_to_transform ladder in the eager, GPU, and per-tile VRT writers with a single resolver call. Adds parametrized parity tests across y/x, lat/lon, row/col, transform-only, crs-only, and rotated-dropped fixtures.
- Distinguish ``transform_only`` (from attrs['transform']) from ``coords`` (derived from coord arrays). The writer path collapsed both into ``coords`` originally, which masked the documented bucket split. - Fold string-valued ``attrs['crs']`` into ``crs_wkt`` so the resolver matches the ``attrs_to_metadata`` precedence. Without this a CRS-only DataArray with a string ``crs`` attr was classified as ``none``. - Update the parity test to assert the new ``transform_only`` bucket on a DataArray with only ``attrs['transform']`` set.
3f0eddc to
e7436d5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2225
Part of #2211
Summary
Centralises the transform/georef contract behind a single resolver so every backend (eager, GPU, dask, VRT, read and write) makes the same decision for matching inputs.
GeorefResolutiondataclass andresolve_georef()in_coords.py. Fields:transform,georef_status,dropped_reason,applied_no_georef_marker. The resolver owns coord-to-transform inference, the no-georef marker, degenerate-axis policy, and rotated-affine drop policy._attrs._compute_georef_statusthrough the resolver so the read-side bucket decision is computed in one place._transform_from_attr->_coords_to_transform->_require_transform_for_georeferencedladder in_writers/eager.py(two sites:to_geotiffand_write_vrt_tiled) and_writers/gpu.pywith oneresolve_georef(data)call.GeoInfofrom_backends/vrt.py::_vrt_to_synthetic_geo_infoalready flowed through_compute_georef_status, which now routes through the resolver. Added a docstring note.The public API is unchanged.
georef_statusvalues,attrs['transform']precedence, and the fail-closed behaviour on degenerate coords (#1945) all match prior behaviour.Tests
New file
xrspatial/geotiff/tests/test_georef_resolver_parity_2211.py(22 tests) covers the six fixture cases the issue calls out plus cross-site parity:y/x,lat/lon,latitude/longitudedim namesrow/col(no-georef placeholder +_NO_GEOREF_KEYmarker)attrs['transform']attrs['crs']GeoInfo->rotated_droppedbucketVerification
pytest xrspatial/geotiff/tests/-> 4,729 passed, 45 skipped, 1 xfailed (pre-existing lz4 failure is unrelated and reproduces on main).test_georef_status_2136.pyandtest_allow_rotated_*suites: all green.Notes
xrspatial/geotiff/_writers/vrt.pyis the publicwrite_vrtmosaic creator. It does not handle DataArrays or build transforms, so the resolver does not apply there. The per-tile VRT write path lives in_writers/eager.py::_write_vrt_tiledand was updated._compute_georef_status_from_parts(VRT inline shim) as-is so VRT branches do not have to build a fakeGeoInfoto feed the resolver; its decision table mirrors the resolver's reader-path branch.