Skip to content

geotiff: share lazy finalization helper across dask backends (PR C of #2162)#2205

Merged
brendancol merged 2 commits into
mainfrom
issue-2178
May 20, 2026
Merged

geotiff: share lazy finalization helper across dask backends (PR C of #2162)#2205
brendancol merged 2 commits into
mainfrom
issue-2178

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Wave 2 of #2162. Both dask read paths had a 25-line validate-populate-stamp block that was identical except for a couple of comment differences. Replace each with one call to _finalize_lazy_read_attrs from #2177 (now on main) so a future fix lands in both backends at once.

There is one wrinkle. The helper's dtype argument is used for two things at once: the resolved graph dtype (which drives masked_nodata) and the caller's cast attr (which drives nodata_dtype_cast). The dask path has to keep those separate, because mask_nodata=True on an integer source auto-promotes the graph dtype to float64 without the caller asking, and that auto-promotion must not leak out as nodata_dtype_cast. The migration passes the resolved target_dtype / declared_dtype so masked_nodata is right, then fixes up nodata_dtype_cast at the call site to match the pre-helper contract.

Preserved behavior:

Closes #2178. Depends on #2177 (merged).

Test plan

  • New test_lazy_finalization_parity_2162.py (19 tests) covering:
    • nodata_pixels_present absent on both backends.
    • nodata_dtype_cast matches across backends when caller forces a cast.
    • nodata_dtype_cast absent on both backends when the graph dtype was auto-promoted by masking.
    • georef_status matches across both backends for full / transform_only / crs_only / none / rotated_dropped.
    • Full attrs-dict parity (modulo backend markers) per fixture.
  • All 91 existing tests in test_nodata_lifecycle_attrs_2135.py, test_nodata_semantics_split_1988.py, test_georef_status_2136.py pass.
  • Full xrspatial/geotiff/tests/ run: 4571 passed, 45 skipped. One pre-existing failure (test_lowlevel_write_pushdown_2138.py::test_write_vs_to_geotiff_byte_parity_uint8[lz4]) reproduces on main and is unrelated.

…2162)

Both dask read paths had the same 25-line validate-populate-stamp block.
Replace each with one call to `_finalize_lazy_read_attrs` from #2177 so a
future fix lands in both backends at once.

The helper's `dtype` argument is used for two things at once: the resolved
graph dtype (for `masked_nodata`) and the caller's cast attr (for
`nodata_dtype_cast`). The dask path has to keep those separate, because
masking on an integer source auto-promotes to float64 without the caller
asking, and that auto-promotion must not leak out as `nodata_dtype_cast`.
Fix it up at the call site after the helper returns.

Closes #2178. Wave 2 of #2162.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 20, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: geotiff: share lazy finalization helper across dask backends (PR C of #2162)

Blockers

None.

Suggestions

  • dask.py:399-404 and gpu.py:1587-1591 carry an identical 4-line fixup block (if dtype is None: pop / elif nodata: set). Consider pulling it into a small helper in _attrs.py (e.g. _apply_caller_dtype_cast(attrs, *, caller_dtype, has_nodata)) so the two backends share the fixup and a future change lands in one place. Not blocking; the current shape is fine.

  • _attrs.py:_finalize_lazy_read_attrs docstring (already on main) flags the dtype conflation. Worth a follow-up issue to split it into graph_dtype + caller_dtype so the call-site fixup goes away entirely. Out of scope here.

Nits

  • test_lazy_finalization_parity_2162.py:81 declares _BACKEND_MARKER_KEYS: frozenset[str] = frozenset() and _strip_backend_markers becomes an identity transform. Either drop both and compare attrs directly, or keep the indirection. The comment defends the current shape, but the empty frozenset reads odd at first glance.

  • test_lazy_finalization_parity_2162.py:266-275 (test_dtype_cast_absent_without_caller_dtype) asserts out.dtype == np.float64 on both backends, which is correct, but it does not exercise the dtype.kind != 'f' branch of the call-site fixup (elif nodata_attr is not None: attrs['nodata_dtype_cast'] = np.dtype(dtype).name). Consider adding a case with mask_nodata=False, dtype=np.int32 so the integer-cast branch is also pinned.

What looks good

  • The helper migration is one-for-one with the pre-helper inline code. Validate-then-populate-then-stamp order is preserved.
  • nodata_attr (the pre-MinIsWhite-inversion value) is correctly threaded to the helper in the CPU dask path, matching the pre-helper contract for attrs['nodata'].
  • The dtype conflation is recognised and the call-site fixup is the minimal correct fix for the pre-helper contract.
  • 19 new tests cover the five georef states, both dask backends, and the dtype-cast presence/absence cases. All pass.
  • The 91 existing tests in the three pinned regression files (test_nodata_lifecycle_attrs_2135.py, test_nodata_semantics_split_1988.py, test_georef_status_2136.py) still pass.
  • Full geotiff test suite passes (4571 tests; the one failure is pre-existing on main and unrelated).

Checklist

  • Refactor preserves the pre-helper attrs contract on both dask backends.
  • Both implemented backends (dask+numpy, dask+cupy) produce consistent results (full attrs-dict parity test).
  • NaN handling unchanged.
  • Edge cases covered: 5 georef states + cast/no-cast + int/float fixtures.
  • Dask chunk boundaries unchanged (no graph code changed).
  • No premature materialization or unnecessary copies (helper does not touch arrays).
  • [N/A] Benchmark exists or is not needed: pure refactor.
  • [N/A] README feature matrix: no new functions.
  • [N/A] Docstrings: no public API change.

- Factor the call-site fixup into `_apply_caller_dtype_cast` in
  `_attrs.py` so both dask backends share the logic. Follows up the
  reviewer's note that the two backends carried an identical 4-line
  block.
- Drop the empty `_BACKEND_MARKER_KEYS` indirection in the parity
  test; compare attrs dicts directly. The future-proofing comment
  was not paying its weight with an empty frozenset.
- Add an integer-cast parity test pinning the `dtype.kind != 'f'`
  branch of the fixup (`mask_nodata=False, dtype=np.int32` on both
  backends).

Filed #2206 to track splitting the helper's `dtype` parameter into
`graph_dtype` + `caller_dtype` so the call-site fixup goes away
entirely. Out of scope for this PR per the wave-1 helper signature
freeze.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up PR Review: geotiff: share lazy finalization helper across dask backends (PR C of #2162)

Disposition of original findings

  • Suggestion 1 (factor fixup helper): Fixed. _apply_caller_dtype_cast lives in _attrs.py next to _finalize_lazy_read_attrs; both dask backends call it directly. The diff is small and the intent is clearer.
  • Suggestion 2 (split helper's dtype parameter): Deferred to #2206. The helper signature was frozen as part of wave 1; the split is the same idea, just one PR later.
  • Nit 1 (empty _BACKEND_MARKER_KEYS frozenset): Fixed. Removed the indirection; the parity test compares dicts directly. The docstring and one comment that referenced "backend markers" were also updated.
  • Nit 2 (missing integer-cast branch test): Fixed. Added test_dtype_cast_records_integer_target parametrized over both backends. The dtype.kind != 'f' path is now pinned.

Second-pass findings

None.

What still looks good

  • Both dask backends call _finalize_lazy_read_attrs + _apply_caller_dtype_cast in two lines each. The fixup is no longer duplicated.
  • The new helper docstring covers all three branches (caller_dtype None / set+has_nodata / set+no_nodata) so future readers do not have to derive the contract from the call site.
  • 21 parity tests pass (up from 19 after adding the integer-cast case).
  • 91 regression tests in the three pinned files still pass.
  • 4573 tests pass in the full geotiff suite. The same one pre-existing failure as before remains unrelated.

Checklist

  • All blockers fixed.
  • All suggestions either fixed or deferred to a tracked follow-up.
  • All nits fixed.
  • Tests still pass on both backends after the follow-up commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeoTIFF: migrate dask backends to shared lazy finalization helper (PR C of #2162)

1 participant