Skip to content

Add release-gate windowed-read coords + shifted-transform parity test (#2358)#2364

Open
brendancol wants to merge 3 commits into
mainfrom
issue-2358
Open

Add release-gate windowed-read coords + shifted-transform parity test (#2358)#2364
brendancol wants to merge 3 commits into
mainfrom
issue-2358

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

@brendancol brendancol commented May 24, 2026

Closes #2358. PR 2 of 5 for epic #2341.

Summary

  • New file xrspatial/geotiff/tests/test_release_gate_windowed_reads_2341.py locks the windowed-read contract: shape equals the window's (height, width), coords['y'/'x'] are a bit-exact slice of the unwindowed coords, attrs['transform'] equals T_full * Affine.translation(col_off, row_off) by exact tuple equality, and the non-transform canonical release attrs match the unwindowed read.
  • Corpus is three files: int16 stripped with an integer nodata sentinel, float32 tiled with a NaN sentinel, and float32 stripped without a declared sentinel.
  • Both eager (open_geotiff(window=...)) and dask (read_geotiff_dask(window=..., chunks=32)) paths are exercised. masked_nodata is data-dependent (true iff sentinel-to-NaN actually ran), so it is asserted structurally (presence + bool type) instead of by value.
  • Adds the matching row under "Local GeoTIFF read and write" in docs/source/reference/release_gate_geotiff.rst.

Backend coverage

  • numpy (eager open_geotiff)
  • dask+numpy (read_geotiff_dask)

GPU paths are out of scope for this row and are covered by test_backend_pixel_parity_matrix_1813.py.

Test plan

  • pytest xrspatial/geotiff/tests/test_release_gate_windowed_reads_2341.py -- 24 passed locally (3 files x 2 readers x 4 assertions).
  • CI green on the PR.

Notes

  • The transform check computes T_full * Affine.translation(col_off, row_off) by hand. It does not call rasterio.windows.transform, so the test pins the spec rather than whatever rasterio happens to compute.
  • Temp filenames include the issue tag and a per-test uuid so sibling worktrees and xdist workers do not collide on the same path.

…#2358)

Adds xrspatial/geotiff/tests/test_release_gate_windowed_reads_2341.py
to lock the contract that a windowed read returns the correct shape,
coords that slice the unwindowed coords bit-exact, an attrs['transform']
equal to T_full * Affine.translation(col_off, row_off) with no float
drift, and unchanged canonical non-transform release attrs. Covers
eager and dask read paths across a representative corpus (int16 with
nodata, float32 tiled with NaN nodata, float32 stripped without
nodata).

Adds the matching row in docs/source/reference/release_gate_geotiff.rst.

Part of epic #2341, PR 2 of 5.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 24, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Add release-gate windowed-read coords + shifted-transform parity test (#2358)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • _assert_canonical_attrs_unchanged uses isinstance(full_val, float) and np.isnan(full_val) to detect a NaN nodata sentinel (test file lines 325-329). The reader currently normalises the read-back nodata to a Python float, so this works today, but if any backend ever returns the sentinel as np.float32 (which is what the input fixture passes to to_geotiff) the isinstance branch silently falls through to win_val == full_val and NaN != NaN flips the test from "checked NaN equality" to "always fails". A more durable form is something like

    try:
        full_is_nan = bool(np.isnan(full_val))
    except (TypeError, ValueError):
        full_is_nan = False

    which handles both Python and numpy scalar NaNs without binding to a specific dtype.

  • The dask backend uses chunks=32 over a 256x256 raster with a window starting at (row_off=32, col_off=64). Both window offsets are exact multiples of 32, so the window aligns to chunk boundaries and the test does not exercise the case where the window splits a chunk. A second parametrize case at e.g. chunks=20 or a window offset of (33, 65) would cover the misaligned path without much extra runtime.

Nits (optional improvements)

  • _PIXEL_HEIGHT = -25.0, _ORIGIN_Y = 4001987.25 -- the values are well chosen to defeat integer-only window math, but a one-line comment on _ORIGIN_X = 500123.5 explaining "fractional origin so a windowed reader that drops the fractional part fails the equality assertion" would make the choice self-documenting.

  • The corpus uses three files: int16-deflate-stripped, float32-deflate-tiled, float32-none-stripped. A MinIsWhite cell is on the epic's representative-files list (see test_backend_pixel_parity_matrix_1813.py::_TIFF_FIXTURES). Out of scope for this PR if you want to keep the diff small, but consider adding it in a follow-up so this gate covers every dtype/layout the eager/dask matrix covers.

What looks good

  • Algebra: _expected_window_transform keeps the b and d skew terms so the formula generalises if a rotated-grid fixture is added later. The implementation matches T_full * Affine.translation(col_off, row_off) exactly for axis-aligned grids.
  • No rasterio oracle: the transform check pins the spec directly, as the issue asked.
  • Tuple equality (not approx) on the transform forces the reader to thread the algebraic identity all the way through, instead of re-deriving the origin from the y/x coord arrays where float rounding could creep in.
  • masked_nodata is correctly identified as data-dependent and asserted structurally (presence + bool type), with the rationale in a comment pointing at _attrs.py.
  • Temp filenames include the issue tag + a uuid so sibling rockout worktrees and xdist workers cannot collide.
  • @pytest.mark.release_gate is registered in setup.cfg (line 115).
  • The matching row was added to docs/source/reference/release_gate_geotiff.rst under "Local GeoTIFF read and write" with a clear one-line acceptance.

Checklist

  • Algorithm matches reference/paper (algebraic identity, not rasterio)
  • All implemented backends produce consistent results (eager + dask both pass)
  • NaN handling is correct (works today; see Suggestion 1 for a durability nit)
  • Edge cases covered (int sentinel, float NaN sentinel, no sentinel)
  • Dask chunk boundaries handled (32-aligned today; see Suggestion 2 for the misaligned case)
  • No premature materialization (no .compute(); coords are already eager on xarray)
  • Benchmark not needed (test-only PR)
  • README feature matrix update not needed (no new public function)
  • Docstrings present and accurate

Applies the four findings from the self-review on PR #2364:

- Robust NaN compare on the canonical-attrs check: drop the
  ``isinstance(x, float)`` gate in favour of a try/except around
  ``np.isnan(x)`` so a future backend that returns a numpy-scalar
  nodata cannot silently fall into the ``==`` branch where NaN != NaN
  flips the test from "checked equal" to "always fails".
- Add a chunk-misaligned window case ``(33, 65, 95, 191)`` alongside
  the chunk-aligned ``(32, 64, 96, 192)`` window so the dask reader
  has to split chunks instead of starting on a chunk boundary.
- Extend the corpus with a uint8 MinIsWhite stripped fixture so this
  release-gate row covers the same representative layout as
  ``test_backend_pixel_parity_matrix_1813.py``. Gated on tifffile
  availability via the same skip marker pattern. Because the
  MinIsWhite fixture is written without GeoTIFF tags, the
  transform-shift assertion handles the no-georef case by requiring
  that neither read carries a ``transform``; an invented transform on
  the windowed side would be the silent-wrongness this gate exists to
  catch.
- Document the ``_ORIGIN_X = 500123.5`` choice with a comment on why
  the fractional value matters for the exact-tuple equality.

Test count goes from 24 (3 files x 2 readers x 4 assertions) to 64
(4 files x 2 readers x 2 windows x 4 assertions); local run is still
under one second.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (follow-up): Add release-gate windowed-read coords + shifted-transform parity test (#2358)

Second pass after the round-1 follow-up commit (2d53ea29).

Blockers

None.

Suggestions

None.

Nits (optional)

  • The module header docstring (lines 8-12) still names three corpus cells (integer + nodata, float + NaN nodata, float no nodata). The follow-up commit added a fourth (uint8 MinIsWhite stripped). Update the docstring so the prose matches _CORPUS. One-line fix.

What looks good

  • All four findings from the first review are addressed.
  • NaN compare is now backend-agnostic via try: np.isnan(x); a future numpy-scalar nodata cannot silently slip into the == branch.
  • The chunk-misaligned window (33, 65, 95, 191) exercises the dask reader's chunk-split path; the aligned window keeps the boundary-aligned case covered.
  • MinIsWhite cell added with the same tifffile-gated marker pattern used by test_backend_pixel_parity_matrix_1813.py, and _assert_transform_shifted now handles the legitimate no-georef case (no transform on either side) without papering over a real attr-drop.
  • The _ORIGIN_X = 500123.5 fractional value is now self-documenting.

Checklist

  • Algorithm matches the algebraic spec (no rasterio oracle)
  • Eager and dask paths covered (numpy + dask+numpy)
  • NaN handling is robust (backend-agnostic via np.isnan)
  • Edge cases covered (int sentinel, float NaN, no sentinel, MinIsWhite no-georef)
  • Dask chunk boundaries handled (aligned + misaligned cases)
  • No premature materialization
  • Benchmark not needed (test-only)
  • README feature matrix update not needed (no new public function)
  • Docstrings present (header prose has a minor stale-count nit; see above)

@brendancol
Copy link
Copy Markdown
Contributor Author

CI status note

The Python 3.14 pytest jobs are red on this PR, but the four failing tests are pre-existing failures on main:

  • xrspatial/geotiff/tests/test_unsupported_features_2349.py::test_vrt_with_skewed_geotransform_rejected
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_unsupported_resample_alg_raises
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_negative_srcrect_size_rejected
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_negative_dstrect_size_rejected

All four are regex-match failures against RotatedTransformError / VRTUnsupportedError messages whose wording was tightened on main. The same four tests fail identically on main's 3.14 CI (run 26363918141, ubuntu-latest 3.14), and none of them touch any file this PR modifies (git diff origin/main HEAD on those paths is empty).

The Python 3.12 jobs and the new test file added by this PR pass on every backend matrix cell (eager + dask x 4 corpus files x 2 windows x 4 assertions = 64 cases; MinIsWhite cells skip cleanly when tifffile is not installed).

Flagging the red checks as out of scope for this PR -- the regex tightening lives in sibling tests and should land on its own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Release-gate: windowed-read coords + shifted transform parity (PR 2 of 5 of epic #2341)

1 participant