Skip to content

Require explicit opt-in on experimental and internal-only GeoTIFF paths (PR 4 of epic #2340)#2356

Merged
brendancol merged 3 commits into
mainfrom
issue/2352-experimental-internal-optin
May 24, 2026
Merged

Require explicit opt-in on experimental and internal-only GeoTIFF paths (PR 4 of epic #2340)#2356
brendancol merged 3 commits into
mainfrom
issue/2352-experimental-internal-optin

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Scope notes

PR 4 of 6 under epic #2340. Closes #2352.

Out of scope, covered by sibling PRs:

Test plan

  • test_experimental_internal_optin_2352.py covers signature pins, validator unit tests (stable / experimental / internal-only / round-trip exemption), end-to-end read rejections + opt-in acceptance for LERC / LZ4 / JPEG2000 / JPEG-in-TIFF, and the dask graph-build gate.
  • Existing test_supported_features_tiers_2137.py still passes (writer-side flag pins).
  • Existing codec / rich-tag tests updated to thread the new opt-ins through; test_reader_kwarg_order_1935.py canonical order slots the new flags next to the other allow_* gates.
  • Full xrspatial/geotiff/tests/ suite passes locally (4517 passed; CPU + dask paths, GPU + network deselected).
  • Broader xrspatial/tests/ suite still passes locally (3889 passed).

…hs (#2352)

PR 4 of 6 under epic #2340 (GeoTIFF release contract). Adds the
matching read-side gates for the Tier 3 / Tier 4 codecs the writer
already gated in #2137 and #1845, and extends the
``allow_experimental_codecs`` opt-in onto the writer rich-tag attrs
(``gdal_metadata_xml`` / ``extra_tags``).

Read side:
- ``open_geotiff`` / ``read_geotiff_dask`` / ``read_geotiff_gpu`` /
  ``read_vrt`` now reject sources compressed with LERC, JPEG2000 /
  J2K, or LZ4 unless ``allow_experimental_codecs=True``.
- The same readers reject JPEG-in-TIFF unless
  ``allow_internal_only_jpeg=True``. The flags do not collapse.
- Gates fire at the public entry point (eager) or at graph build
  (dask), before any decode work, so the caller learns the opt-in
  name from the rejection rather than a deeper decode failure.

Write side:
- ``to_geotiff`` and ``write_geotiff_gpu`` reject writes that include
  ``attrs['gdal_metadata_xml']`` or ``attrs['extra_tags']`` unless
  ``allow_experimental_codecs=True``. Round-tripped attrs (carrying
  ``_xrspatial_geotiff_contract``) are exempt so the canonical
  attrs contract from #1984 still round-trips without a flag.

Tests:
- New ``test_experimental_internal_optin_2352.py`` pins the
  rejections, the exemption shape, and the public signatures.
- Existing codec / rich-tag tests pass the new opt-ins through; the
  ``test_reader_kwarg_order_1935.py`` canonical kwarg order is
  updated to slot the new flags next to the other ``allow_*``
  policy gates.

Out of scope (sibling PRs under epic #2340):
- Docstring rewrites for public read/write entry points (#2346).
- Docs release-contract page (#2347).
- ``SUPPORTED_FEATURES`` tier audit (#2348).
- Unsupported-combination errors (#2349).

Closes #2352.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 24, 2026
@brendancol
Copy link
Copy Markdown
Contributor Author

Self-review

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_attrs.py:328 -- The _COMPRESSION_TAG_TO_NAME mapping collapses tag 32946 (Adobe Deflate) onto 'deflate', which is correct today: both tags decode through the same codec and 'deflate' sits in the stable tier. Worth a one-line comment that this collapse is deliberate so a future Adobe-Deflate-specific tier entry can't drift through unnoticed.
  • xrspatial/geotiff/_attrs.py:438 -- The round-trip exemption checks '_xrspatial_geotiff_contract' in attrs. A caller could in theory forge this marker to bypass the gate. Acceptable trade-off given the alternative would break the geotiff: define a public contract for DataArray attrs (canonical / alias / pass-through) #1984 round-trip contract, but a comment that this is a soft gate by design would help a future reviewer.
  • xrspatial/geotiff/_backends/dask.py:323 -- The dask read-side gate uses getattr(geo_info, '_ifd_compression', None). If a synthesised geo_info has no _ifd_compression, the gate is silently skipped. The only callers that hit this branch are non-TIFF sources (which by definition have no TIFF compression tag), but a one-line comment would help a future reviewer verify the skip is intentional.

Nits

  • The PR touches 22 existing test files to thread the new opt-ins. Each edit is small and mechanical (add allow_experimental_codecs=True or allow_internal_only_jpeg=True to a read call). A short paragraph in the PR body listing the test categories (codec round-trip, rich-tag round-trip, golden-corpus, canonical kwarg-order) would help future archaeology.

What looks good

  • Validator helpers in _attrs.py are well-isolated, pure, and well-documented. Easy to unit-test in test_experimental_internal_optin_2352.py.
  • Rejection messages name the missing flag, the feature, the tier, and the parent epic. A caller hitting the gate can fix the call site without grepping source.
  • Dask path fires the gate at graph build, before any chunk task runs -- matches the contract the dask max_pixels guard already follows.
  • Round-trip exemption preserves the geotiff: define a public contract for DataArray attrs (canonical / alias / pass-through) #1984 canonical attrs contract while still gating fresh DataArrays. Threading via the contract marker is a clean signal.
  • GPU read path forwards the flag through every CPU-fallback site (5 call sites in _backends/gpu.py).
  • New test file (479 lines) covers signature pins, validator unit tests, end-to-end read rejections, opt-in acceptances, dask graph-build gate, round-trip exemption, JPEG vs experimental flag non-collapse, and the already-gated paths.
  • test_reader_kwarg_order_1935.py canonical order updated so the new flags slot next to the existing allow_* policy gates.

Checklist

  • Default behaviour rejects experimental + internal-only codec reads
  • Default behaviour rejects rich-tag writes on fresh DataArrays
  • Read-then-write round-trip stays no-flag (via contract marker)
  • All read backends share one validator call shape
  • Tests pin every gate, every exemption, every signature
  • All four public read entry points expose the new kwargs
  • Docstrings on the public surface name both flags

…soft gate, and dask graph-build skip (#2352)

Documents three design choices flagged during the self-review pass on
#2356:

* _COMPRESSION_TAG_TO_NAME collapses TIFF tag 32946 (Adobe
  Deflate) onto the stable codec.deflate entry on purpose. The
  comment makes the deliberate collapse vs accidental passthrough
  obvious to a future reviewer.

* The rich-tag gate's _xrspatial_geotiff_contract exemption is a
  soft gate by design: forging the marker by hand would bypass it,
  but the alternative (gating every read-then-write call) would
  break the canonical attrs round-trip from #1984.

* The dask read-side gate uses getattr(..., None) so a
  synthesised geo_info (non-TIFF source) skips the check rather than
  rejects. The comment documents the lockstep invariant
  (_ifd_compression stashed alongside _ifd_photometric /
  _ifd_samples_per_pixel on every TIFF source path) so the skip
  never silently bypasses a real TIFF read.
…al-internal-optin

# Conflicts:
#	xrspatial/geotiff/_backends/dask.py
#	xrspatial/geotiff/_backends/gpu.py
#	xrspatial/geotiff/_backends/vrt.py
#	xrspatial/geotiff/_reader.py
@brendancol brendancol merged commit e73bc12 into main May 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Require explicit opt-in on experimental and internal-only GeoTIFF paths (PR 4 of epic #2340)

1 participant