Skip to content

Refactor GeoTIFF Phase 4: complete metadata finalization centralization#2232

Merged
brendancol merged 2 commits into
mainfrom
issue-2227-attrs-finalization
May 21, 2026
Merged

Refactor GeoTIFF Phase 4: complete metadata finalization centralization#2232
brendancol merged 2 commits into
mainfrom
issue-2227-attrs-finalization

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2227
Part of #2211

Summary

Finishes the migration started by PRs #2200, #2205, #2207, #2209 from epic #2162. After this PR, every read backend assembles its attrs through one of two finalization entry points in xrspatial/geotiff/_attrs.py:

  • _finalize_eager_read for the eager numpy + GPU paths.
  • _finalize_lazy_read_attrs for the dask, dask+GPU, and VRT paths.

Changes

  • _attrs.py: add an optional pixels_present kwarg to _finalize_lazy_read_attrs. The default stays None so the dask backends keep the lazy contract from geotiff: split overloaded masked_nodata into separate nodata lifecycle signals #2135 (a strict per-chunk reduction would force .compute()).
  • _backends/vrt.py: the eager VRT path now threads its VRT-aware nodata_pixels_present scan through the helper's new kwarg instead of writing the attr after the call.
  • _backends/gpu.py: drop stale imports of _populate_attrs_from_geo_info, _set_nodata_attrs, and _validate_read_geo_info. The GPU paths route every attrs write through the two top-level helpers.
  • New test: test_attrs_finalization_parity_2211.py is a table-driven parity check across eager numpy, dask+numpy, GPU (when available), dask+GPU (when available), and VRT. Four fixture shapes (plain float, float with sentinel, integer with sentinel, plain uint8). Backend-specific keys (vrt_holes, nodata_pixels_present, TIFF tag pass-throughs) are carved out in a single named frozenset so a future migration that promotes one of those keys to all backends has one place to update.

Backend coverage

numpy, cupy, dask+numpy, dask+cupy, VRT. GPU branches are gated on cupy + CUDA availability so the test still runs on CPU-only hosts.

Test plan

  • pytest xrspatial/geotiff/tests/test_attrs_finalization_parity_2211.py passes (9/9)
  • pytest xrspatial/geotiff/tests/test_attrs_parity_1548.py test_attrs_contract_*.py test_finalization_helpers_2162.py test_eager_finalization_parity_2162.py test_lazy_finalization_parity_2162.py test_vrt_finalization_parity_2162.py all green (129 tests)
  • Full geotiff suite: 4721 passed, 1 unrelated pre-existing failure in test_lowlevel_write_pushdown_2138.py[lz4] (experimental codec gate, untouched by this PR)

Coordination

PR-B (#2225) is also touching _attrs.py (georef_status). PR-C (#2226) is touching _reader.py/backends for nodata. PR-F (#2229) is adding a parity test. I kept the _attrs.py edit here to one small added kwarg so a rebase against any of those branches stays cheap.

Not in scope

  • No public API change.
  • test_attrs_contract_*.py and test_attrs_parity_1548.py were left untouched and still pass.

Closes #2227 (part of #2211).

* Add optional pixels_present kwarg to _finalize_lazy_read_attrs so
  the eager VRT path can stamp nodata_pixels_present through the
  shared finalization helper instead of writing it after the call.
  The default keeps the dask lazy contract from issue #2135.
* Drop stale _populate_attrs_from_geo_info, _set_nodata_attrs, and
  _validate_read_geo_info imports from _backends/gpu.py. The GPU
  paths now route every attrs write through _finalize_eager_read
  or _finalize_lazy_read_attrs and never call those lower-level
  helpers directly.
* Add test_attrs_finalization_parity_2211.py, a table-driven attrs
  parity test comparing eager numpy, dask+numpy, GPU (when
  available), dask+GPU (when available), and VRT for four fixture
  shapes. Backend-specific keys are carved out in one named
  frozenset.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 21, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: GeoTIFF Phase 4 attrs finalization centralization

Blockers

None.

Suggestions

  • xrspatial/geotiff/tests/test_attrs_finalization_parity_2211.py:19 — docstring still says "MinIsWhite photometric" but the fixture was renamed to uint8_no_nodata. Either restore MinIsWhite coverage or update the description to match the actual fixture.
  • xrspatial/geotiff/_attrs.py:1356 — the new pixels_present kwarg has no direct unit test in test_finalization_helpers_2162.py. The lazy "pixels_present absent by default" assertions stay green, but a positive test (pass pixels_present=True/False, check the attr lands) would prevent a future regression where the kwarg silently stops being threaded through to _set_nodata_attrs. The parity test covers it transitively via VRT, which is OK but indirect.

Nits

  • test_attrs_finalization_parity_2211.py:122_attrs_close declares rtol as a parameter but only consults it inside the transform branch. Either drop the parameter or apply it to every numeric key for consistency. The docstring says "all other keys must match exactly" so dropping it is the cleaner option.
  • test_attrs_finalization_parity_2211.py:67-70 — the 3-D dims branch (['y', 'x', 'band']) in _coord_array is dead code since none of the fixtures pass 3-D arrays. Either drop the branch or add a multi-band fixture.
  • test_attrs_finalization_parity_2211.py:259_open_vrt calls open_geotiff(path) to seed the VRT XML. A silent drift in open_geotiff would propagate into the VRT helper and only surface as a VRT-vs-dask divergence rather than a clean failure. Reading the values from the source DataArray that _write_* returns (or accepting them as parameters) would isolate the VRT path from the eager path.

What looks good

  • The diff is exactly what #2227 called for: one new kwarg with a backward-compatible default, one inline write removed, three stale imports cleaned up.
  • The lazy contract from #2135 is preserved — pixels_present defaults to None and all existing 'nodata_pixels_present' not in out.attrs assertions in test_lazy_finalization_parity_2162.py still pass.
  • The carveout for backend-specific keys lives in one named frozenset with a self-test against the docstring, so a future maintainer adding a new key has a single place to update.
  • The full geotiff suite stays green (the one failure in test_lowlevel_write_pushdown_2138.py[lz4] is the experimental-codec gate and predates this PR).

Checklist

  • Refactor matches issue #2227 scope
  • No public API change; existing contract tests still pass
  • Backend keysets compared in both directions (value + key set)
  • Lazy/dask contract from #2135 preserved
  • Stale imports removed
  • Direct unit coverage for the new pixels_present kwarg (see Suggestion 2)

* Add four unit tests in test_finalization_helpers_2162.py covering
  the new pixels_present kwarg on _finalize_lazy_read_attrs: forwarded
  True / False values land on attrs, the None default still omits the
  attr (dask contract from #2135), and a stray forward is ignored
  when nodata is None.
* test_attrs_finalization_parity_2211.py: decouple the VRT helper
  from open_geotiff by reading dtype + nodata from a _FixtureMeta
  the writer returns, and the on-disk geometry from module-level
  constants shared with _coord_array. The VRT wrapper now generates
  the XML from those constants and shifts origin by half a pixel to
  match the TIFF AREA_OR_POINT=Area convention.
* Drop the unused rtol parameter from _attrs_close and move the
  tolerance into module-level constants.
* Drop the dead 3-D branch from _coord_array and assert 2-D input.
* Update module docstring to match the renamed uint8_no_nodata
  fixture (the previous MinIsWhite text was stale).
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (round 2): GeoTIFF Phase 4 attrs finalization centralization

Re-review after commit e30ab49.

Blockers

None.

Suggestions

None.

Nits

None blocking. The previous round's nits are all addressed.

Disposition of round-1 findings

  • Suggestion 1 (docstring drift): fixed. Module docstring now matches the uint8_no_nodata fixture name.
  • Suggestion 2 (missing direct unit test for pixels_present kwarg): fixed. Four new tests in test_finalization_helpers_2162.py cover the True / False / None / "nodata=None ignores stray forward" cases.
  • Nit 1 (_attrs_close rtol parameter): fixed. Tolerance is now _TRANSFORM_RTOL / _TRANSFORM_ATOL module constants and only applies to the transform key, matching the docstring.
  • Nit 2 (dead 3-D branch): fixed. _coord_array now asserts 2-D input.
  • Nit 3 (_open_vrt calling open_geotiff): fixed. The VRT helper takes a _FixtureMeta from the writer and reads geometry from module-level constants. The same constants seed _coord_array, so the VRT path and the TIFF path now share one source of truth without going through open_geotiff.

What looks good

  • The fixture-geometry constants are paired with an in-function assert (_FIX_HEIGHT / _FIX_WIDTH vs arr.shape), so a future fixture that uses a different shape fails loudly rather than silently producing a mis-sized VRT.
  • The half-pixel shift in _open_vrt (corner-based GDAL convention vs center-based xarray coords) is explicit and explained in a comment, so a future maintainer can tell the offset is intentional rather than a copy-paste bug.
  • The new pixels_present=True / False / None / nodata=None test matrix pins all four states the kwarg can land in, so a future regression where the kwarg stops being threaded through will fail loudly.

Checklist

  • Refactor matches issue #2227 scope
  • No public API change; existing contract tests still pass
  • Backend keysets compared in both directions (value + key set)
  • Lazy/dask contract from #2135 preserved
  • Direct unit coverage for the new pixels_present kwarg
  • VRT helper isolated from open_geotiff
  • Stale imports removed

Ready to merge once CI is green.

@brendancol brendancol merged commit fe7981b into main May 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor GeoTIFF Phase 4: complete metadata finalization centralization (PR-D of #2211)

1 participant