Parent: #2162
Sibling in wave 1: #2175
Goal
Add two private helpers in xrspatial/geotiff/_attrs.py that capture the read-finalization pipelines duplicated across backends. This issue only adds the helpers and unit-tests them in isolation. No backend gets migrated here. That happens in waves 2 (#2178 dask, #2179 eager) and 3 (#2180 VRT).
Adding helpers first lets all downstream migration PRs depend on a frozen signature instead of inventing slightly different shapes.
Scope
_finalize_eager_read
```python
def _finalize_eager_read(
arr,
*,
geo_info,
nodata,
mask_sentinel,
mask_nodata,
dtype,
window,
name,
allow_rotated=False,
allow_unparseable_crs=False,
attrs_in=None,
):
"""Validate, populate attrs, mask, cast, and build an eager DataArray."""
```
Pipeline:
- `_validate_read_geo_info(geo_info, window=window, allow_rotated=..., allow_unparseable_crs=...)`
- `_populate_attrs_from_geo_info(attrs, geo_info, window=window)`
- Mask `arr` using `mask_sentinel` when `mask_nodata` is True. Record `nodata_pixels_present` as a bool.
- Cast dtype; record `nodata_dtype_cast`.
- `_set_nodata_attrs(attrs, nodata, masked=..., pixels_present=..., dtype_cast=...)`
- Return `xarray.DataArray(...)` with coords from the helper.
Why `mask_sentinel` is a parameter: the three GPU eager sites derive it three different ways (`_mw_mask_nodata` local, `_cpu_fallback_geo._mask_nodata`, raw `nodata`). The helper must not read it off `geo_info`.
`_finalize_lazy_read_attrs`
```python
def _finalize_lazy_read_attrs(
*,
geo_info,
nodata,
mask_nodata,
dtype,
window,
allow_rotated=False,
allow_unparseable_crs=False,
attrs_in=None,
):
"""Validate and populate attrs for dask-style lazy reads.
Returns the attrs dict. Does NOT build a DataArray (the caller
assembles the dask graph)."""
```
Pipeline:
- `_validate_read_geo_info(...)`
- `_populate_attrs_from_geo_info(attrs, geo_info, window=window)`
- `_set_nodata_attrs(attrs, nodata, masked=(mask_nodata and float dtype), pixels_present=None, dtype_cast=...)`
- Return `attrs`.
`pixels_present=None` is the documented dask contract from #2135. The signature accepts None explicitly. Computing pixels-present eagerly here would force `.compute()` and break the lazy contract.
Tests
Add `xrspatial/geotiff/tests/test_finalization_helpers_2162.py`. Synthesize `GeoInfo` fixtures and assert:
- Eager helper populates `attrs['nodata']`, `attrs['nodata_pixels_present']`, `attrs['nodata_dtype_cast']`, and `attrs['georef_status']` correctly across float and int input dtypes.
- Eager helper respects `mask_nodata=False` (no masking, no `nodata_pixels_present`).
- Lazy helper populates the same attrs minus `nodata_pixels_present`.
- Both helpers propagate `_validate_read_geo_info` rejections for ambiguous geo_info.
- Both helpers handle `mask_sentinel != nodata` (the GPU minIsWhite inversion case).
Files
- `xrspatial/geotiff/_attrs.py` (add two helpers)
- `xrspatial/geotiff/tests/test_finalization_helpers_2162.py` (new)
Constraints
- No public API change.
- Helpers are private (underscore prefix).
- The eager helper returns a `DataArray`; the lazy helper returns an attrs dict. Wave 2 and wave 3 callers depend on these shapes.
- `_validate_read_geo_info` runs first. Partial attrs must not leak when validation fails.
Out of scope
- Migrating any backend to call these helpers (waves 2 and 3).
- Changes to the existing shared helpers (`_validate_read_geo_info`, `_populate_attrs_from_geo_info`, `_set_nodata_attrs`).
Parent: #2162
Sibling in wave 1: #2175
Goal
Add two private helpers in
xrspatial/geotiff/_attrs.pythat capture the read-finalization pipelines duplicated across backends. This issue only adds the helpers and unit-tests them in isolation. No backend gets migrated here. That happens in waves 2 (#2178 dask, #2179 eager) and 3 (#2180 VRT).Adding helpers first lets all downstream migration PRs depend on a frozen signature instead of inventing slightly different shapes.
Scope
_finalize_eager_read```python
def _finalize_eager_read(
arr,
*,
geo_info,
nodata,
mask_sentinel,
mask_nodata,
dtype,
window,
name,
allow_rotated=False,
allow_unparseable_crs=False,
attrs_in=None,
):
"""Validate, populate attrs, mask, cast, and build an eager DataArray."""
```
Pipeline:
Why `mask_sentinel` is a parameter: the three GPU eager sites derive it three different ways (`_mw_mask_nodata` local, `_cpu_fallback_geo._mask_nodata`, raw `nodata`). The helper must not read it off `geo_info`.
`_finalize_lazy_read_attrs`
```python
def _finalize_lazy_read_attrs(
*,
geo_info,
nodata,
mask_nodata,
dtype,
window,
allow_rotated=False,
allow_unparseable_crs=False,
attrs_in=None,
):
"""Validate and populate attrs for dask-style lazy reads.
Returns the attrs dict. Does NOT build a DataArray (the caller
assembles the dask graph)."""
```
Pipeline:
`pixels_present=None` is the documented dask contract from #2135. The signature accepts None explicitly. Computing pixels-present eagerly here would force `.compute()` and break the lazy contract.
Tests
Add `xrspatial/geotiff/tests/test_finalization_helpers_2162.py`. Synthesize `GeoInfo` fixtures and assert:
Files
Constraints
Out of scope