Skip to content

geotiff: synthesize WKT for citation-only user-defined CRSes (#1930)#2054

Merged
brendancol merged 2 commits into
mainfrom
1930-crs-citation-fix
May 18, 2026
Merged

geotiff: synthesize WKT for citation-only user-defined CRSes (#1930)#2054
brendancol merged 2 commits into
mainfrom
1930-crs-citation-fix

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Adds _synthesize_user_defined_wkt to xrspatial/geotiff/_geotags.py so the reader can build a canonical WKT for citation-only user-defined geographic CRSes from the ellipsoid and units GeoKeys, then stamp it on attrs['crs_wkt']. Closes the citation-only parity gap from geotiff: first-class backend parity contract with a golden test corpus #1930.
  • Drops the crs_citation_only entry from _PARITY_GAPS in all four phase-3 backend modules (eager numpy, dask numpy, GPU, dask+GPU) and moves the entry from "Real parity gaps" to "Resolved gaps" in each module's docstring.
  • Adds unit tests for the synthesizer (sphere, oblate, projected-skip, missing-ellipsoid-skip) and end-to-end oracle tests covering attrs stamping, PROJ-dict equality with the rasterio reference, and compare_to_oracle round-trip.

Test plan

  • pytest xrspatial/geotiff/tests/test_golden_corpus_*_1930.py xrspatial/geotiff/tests/golden_corpus/ -> 253 passed, 8 skipped, 4 xfailed (remaining JPEG axis-order gap), 0 XPASS.
  • pytest xrspatial/geotiff/tests/test_user_defined_crs_wkt_1632.py -> all green.
  • pytest xrspatial/geotiff/tests/golden_corpus/test_oracle.py -> all green including the two new citation tests.

Refs #1930.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
brendancol added a commit that referenced this pull request May 18, 2026
After the oracle's axis-order normaliser landed, the eager / dask
JPEG xfails resolved cleanly. The GPU and dask+GPU paths still
xfailed because the GPU JPEG-YCbCr decoder is not implemented.
Address both:

* GPU module: introduce ``_GPU_CPU_FALLBACK`` to mark fixtures
  whose codec is genuinely not implemented on the GPU. The parity
  test routes those through ``on_gpu_failure='auto'`` instead of
  ``'strict'``, which exercises the documented CPU fallback
  contract. The fallback yields a CPU-decoded DataArray that the
  oracle compares cleanly against the rasterio reference. JPEG
  cell now PASSES.

* dask+GPU module: move JPEG from ``_DASK_GPU_SKIPS`` to
  ``_INTENTIONAL_SKIPS``. The chunked GPU path cannot fall back
  per chunk -- the decode error surfaces at ``.compute()`` time
  regardless of ``on_gpu_failure`` mode, and there is no
  foreseeable fix short of implementing nvCOMP JPEG-YCbCr
  decode. Plain skip is more honest than xfail(strict=True).

Result: GPU and dask+GPU modules carry no JPEG xfails. The only
remaining corpus xfails are the four ``crs_citation_only`` entries
across the four backend modules, owned by the parallel PR #2054.
Closes the citation-only CRS parity gap that every phase-3 backend
module flagged. The crs_citation_only fixture in the golden corpus
carries a user-defined geographic CRS (no EPSG, no WKT in the
citation) and the reader was decoding it only into the deprecated
geog_citation / datum_code / angular_units attrs. attrs['crs_wkt']
stayed None, so the oracle's _candidate_crs returned None on the
candidate side and compare_to_oracle failed.

This change adds _synthesize_user_defined_wkt to _geotags.py. When
the file declares a user-defined geographic CRS and exposes the
ellipsoid (semi_major plus semi_minor or inv_flattening) and the
angular-units GeoKeys, the helper feeds those parameters to
pyproj.CRS.from_dict and stamps the resulting WKT on
attrs['crs_wkt']. The synthesis fails closed (returns None) when
pyproj is missing, when there is no semi_major, or when the
ellipsoid shape is ambiguous, so the existing deprecated-attrs path
stays in place.

Projected user-defined CRSes (ModelTypeProjected with
ProjectedCSType == 32767) are not yet reconstructible from GeoKeys
alone -- they need the GeogPrime / Projection parameters that the
corpus does not exercise -- so the helper returns None for that
case.

Drops the crs_citation_only entry from _PARITY_GAPS in each of the
four phase-3 backend modules (eager numpy, dask numpy, GPU,
dask+GPU). xfail(strict=True) flips to a real failure when the test
starts passing, so the fix and the entry cleanups have to land
together. Each module's top-of-file docstring moves the citation
entry from "Real parity gaps" to "Resolved gaps", mirroring the
masked-nodata fix from PR #2046.

Adds unit tests for _synthesize_user_defined_wkt in
test_user_defined_crs_wkt_1632.py: the sphere case (the citation
fixture shape), the oblate ellipsoid case, the projected-CRS skip
path, and the missing-ellipsoid skip path. Adds end-to-end fixture
tests in test_oracle.py for attrs['crs_wkt'] stamping, PROJ-dict
equality with the rasterio reference, and compare_to_oracle
round-trip.

After the fix the four phase-3 backend modules report 115 passed,
8 skipped, 4 xfailed (the remaining JPEG axis-order gap). The full
golden_corpus / corpus suite reports 253 passed, 8 skipped, 4
xfailed with no XPASS.
* Adds ``GEOKEY_GEOG_SEMI_MINOR_AXIS = 2058`` next to the other
  GeoKey ID constants and uses it in ``extract_geo_info`` instead of
  the raw int literal. Matches the existing convention for the
  semi-major-axis and inv-flattening keys.

* Drops ``angular_units_code`` and ``geog_citation`` from
  ``_synthesize_user_defined_wkt``. The function never read them.
  PROJ's ``longlat`` always emits degrees, and the corpus has no
  radian-unit user-defined fixture; the docstring now calls that
  limit out so a future radian fixture forces a deliberate signature
  change rather than silently degrading the units.

* Expands the docstring's "not handled" section to call out
  ``MODEL_TYPE_GEOCENTRIC`` (3) and unknown / zero model_type. Adds
  a ``test_synthesize_user_defined_wkt_geocentric_returns_none``
  pin so a future change that promotes geocentric to a real
  proj_dict has to touch the test deliberately.

Full corpus suite still reports 287 passed, 8 skipped, 4 xfailed
(JPEG axis-order gap), 0 XPASS.
@brendancol brendancol force-pushed the 1930-crs-citation-fix branch from 91d8f03 to 69a4ffe Compare May 18, 2026 20:27
@brendancol brendancol merged commit 339df70 into main May 18, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant