Skip to content

geotiff: reject writes whose crs and crs_wkt attrs disagree (#1987 PR 1)#2021

Merged
brendancol merged 1 commit into
mainfrom
1987-pr1-conflicting-crs-write
May 18, 2026
Merged

geotiff: reject writes whose crs and crs_wkt attrs disagree (#1987 PR 1)#2021
brendancol merged 1 commit into
mainfrom
1987-pr1-conflicting-crs-write

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

First fail-closed slice for issue #1987. The framework PR (#2006) landed the typed error hierarchy and the validator-hook registry; this PR wires the first check.

Today's behaviour

to_geotiff / _write_vrt_tiled / write_geotiff_gpu consult data.attrs.get('crs') first and fall back to data.attrs.get('crs_wkt') only when the EPSG attr is missing. When the two attrs encode different CRSes, the writer silently emits the EPSG and drops the WKT — the on-disk CRS does not match what the DataArray advertised.

Change

  • _validation.py gains _check_write_conflicting_crs: when both attrs are populated and the caller did not pass an explicit crs= kwarg, the check canonicalises each attr via pyproj and raises ConflictingCRSError if they disagree. The error class was already exported from xrspatial.geotiff via the framework PR.
  • The check is registered at module load via register_write_metadata_check. Tests can scope around it with unregister_write_metadata_check.
  • The three write entry points (to_geotiff, _write_vrt_tiled, write_geotiff_gpu) call validate_write_metadata with a context dict carrying crs_kwarg, attrs_crs, and attrs_crs_wkt.

Soft preconditions kept lenient

  • No pyproj installed → no-op.
  • Either attr unparseable → no-op (sibling UnparseableCRSError PR will catch).
  • Caller passes crs= explicitly → short-circuit, since the kwarg overrides both attrs for this write.

Round-trip safety

The reader emits both attrs whenever the file has a CRS, so a typical open_geotiff → to_geotiff pair hits the check with both attrs set. They derive from the same on-disk CRS, so pyproj equality passes. Pinned by test_read_back_roundtrip_does_not_raise.

Tests

14 new tests in test_conflicting_crs_write_1987.py:

  • registration in the write-side hook registry
  • exception-class hierarchy (ValueError + GeoTIFFAmbiguousMetadataError)
  • disagreement raises
  • agreement passes (EPSG int + WKT, EPSG string + WKT)
  • only-one-attr-set cases pass
  • explicit crs= kwarg bypasses the check
  • crs=0 corner does not mask the existing kwarg validation
  • unparseable WKT defers to the (future) UnparseableCRSError
  • end-to-end read-back round-trip

Test plan

  • pytest xrspatial/geotiff/tests/test_conflicting_crs_write_1987.py -v — 14 passed.
  • pytest xrspatial/geotiff/tests/ -k "not gpu and not cuda" — 2993 passed, no regressions.

Refs #1987.

First fail-closed slice for issue #1987. The framework PR (#2006)
landed the error hierarchy and the validator-hook registry; this PR
wires the first check.

Behaviour today
---------------
``to_geotiff`` / ``_write_vrt_tiled`` / ``write_geotiff_gpu`` all
consult ``data.attrs.get('crs')`` first and fall back to
``data.attrs.get('crs_wkt')`` only when the EPSG attr is missing.
When the two attrs encode different CRSes the writer silently emits
the EPSG and drops the WKT, producing a TIFF whose on-disk CRS does
not match what the DataArray advertised.

Change
------
* ``_validation.py`` gains ``_check_write_conflicting_crs``: when
  both attrs are populated and neither is suppressed by an explicit
  ``crs=`` kwarg, the check canonicalises each via pyproj and raises
  ``ConflictingCRSError`` if they disagree. ``ConflictingCRSError``
  is exported from ``xrspatial.geotiff`` already (the framework PR
  added it to ``_errors.py``).
* The check is registered at module load via
  ``register_write_metadata_check``. Tests that need to scope around
  it can call ``unregister_write_metadata_check`` in a fixture.
* The three write entry points (``to_geotiff``,
  ``_write_vrt_tiled``, ``write_geotiff_gpu``) now call
  ``validate_write_metadata`` with a context dict carrying
  ``crs_kwarg``, ``attrs_crs``, and ``attrs_crs_wkt``.

Soft preconditions kept lenient
-------------------------------
* No pyproj installed -> no-op (the writer's own WKT-fallback path
  decides what happens; pyproj is otherwise listed in setup.cfg).
* Either attr unparseable -> no-op (sibling
  ``UnparseableCRSError`` PR will catch those).
* Caller passes ``crs=`` explicitly -> short-circuit, since the
  kwarg overrides both attrs for this write.

Round-trip safety
-----------------
The reader emits ``attrs['crs']`` and ``attrs['crs_wkt']`` whenever
the file has a CRS, so a typical ``open_geotiff -> to_geotiff`` pair
hits the check with both attrs set. They derive from the same on-disk
CRS, so pyproj equality passes and the write proceeds. The
end-to-end ``test_read_back_roundtrip_does_not_raise`` pins this.

Tests
-----
14 new tests in ``test_conflicting_crs_write_1987.py`` cover:
* registration in the write-side hook registry
* exception-class hierarchy (ValueError + GeoTIFFAmbiguousMetadataError)
* disagreement raises
* agreement passes (EPSG int + WKT, EPSG string + WKT)
* only-one-attr-set cases pass
* explicit ``crs=`` kwarg bypasses the check
* ``crs=0`` corner does not mask the existing kwarg validation
* unparseable WKT defers to the (future) ``UnparseableCRSError``
* end-to-end read-back round-trip

Full geotiff test suite passes (2993 tests, no regressions).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
@brendancol brendancol merged commit 8bc981d into main May 18, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant