Goal
Split the dtype parameter on _finalize_lazy_read_attrs into two so the call-site fixup in the dask backends can go away.
Context
_finalize_lazy_read_attrs (in xrspatial/geotiff/_attrs.py, added in #2177) takes one dtype argument that does two jobs:
- Resolved graph dtype: drives
masked_nodata = bool(mask_nodata and np.dtype(dtype).kind == 'f').
- Caller-supplied cast attr: written as
attrs['nodata_dtype_cast'] = np.dtype(dtype).name.
On the dask paths these are different values. When mask_nodata=True and the source is integer, the dask backends auto-promote the graph dtype to float64 without the caller asking for a cast. The auto-promotion is the right value for masked_nodata and the wrong value for nodata_dtype_cast.
#2178 worked around this by passing the resolved graph dtype to the helper and fixing up nodata_dtype_cast via a small _apply_caller_dtype_cast helper (one call per backend). The helper docstring on _finalize_lazy_read_attrs already flagged the conflation as deferred to the migration PR.
Proposed change
Split the helper's signature:
def _finalize_lazy_read_attrs(
*,
geo_info,
nodata,
mask_nodata,
graph_dtype, # was: dtype; drives masked_nodata
caller_dtype=None, # new; drives nodata_dtype_cast
window,
allow_rotated=False,
allow_unparseable_crs=False,
attrs_in=None,
) -> dict:
...
graph_dtype is the resolved dask graph dtype (what the helper currently computes masked from). caller_dtype is the caller's dtype= kwarg verbatim, or None when the caller did not pass one.
After the split:
_apply_caller_dtype_cast becomes dead code; remove it.
- Both dask backends drop the post-helper fixup; the helper handles both attrs in one place.
Out of scope
The eager backends (#2179 wave 2 sibling, #2180 wave 3) do not have the auto-promotion issue because masking is folded into a single eager step; they would just pass caller_dtype=dtype and graph_dtype=resolved_dtype symmetrically.
Files
Acceptance criteria
_finalize_lazy_read_attrs takes two dtype parameters.
- The dask backends call the helper once with no post-call fixup.
- Existing tests still pass.
test_lazy_finalization_parity_2162.py still passes unchanged.
Goal
Split the
dtypeparameter on_finalize_lazy_read_attrsinto two so the call-site fixup in the dask backends can go away.Context
_finalize_lazy_read_attrs(inxrspatial/geotiff/_attrs.py, added in #2177) takes onedtypeargument that does two jobs:masked_nodata = bool(mask_nodata and np.dtype(dtype).kind == 'f').attrs['nodata_dtype_cast'] = np.dtype(dtype).name.On the dask paths these are different values. When
mask_nodata=Trueand the source is integer, the dask backends auto-promote the graph dtype tofloat64without the caller asking for a cast. The auto-promotion is the right value formasked_nodataand the wrong value fornodata_dtype_cast.#2178 worked around this by passing the resolved graph dtype to the helper and fixing up
nodata_dtype_castvia a small_apply_caller_dtype_casthelper (one call per backend). The helper docstring on_finalize_lazy_read_attrsalready flagged the conflation as deferred to the migration PR.Proposed change
Split the helper's signature:
graph_dtypeis the resolved dask graph dtype (what the helper currently computesmaskedfrom).caller_dtypeis the caller'sdtype=kwarg verbatim, orNonewhen the caller did not pass one.After the split:
_apply_caller_dtype_castbecomes dead code; remove it.Out of scope
The eager backends (#2179 wave 2 sibling, #2180 wave 3) do not have the auto-promotion issue because masking is folded into a single eager step; they would just pass
caller_dtype=dtypeandgraph_dtype=resolved_dtypesymmetrically.Files
xrspatial/geotiff/_attrs.py(_finalize_lazy_read_attrssignature,_apply_caller_dtype_castremoval)xrspatial/geotiff/_backends/dask.py(drop the fixup, rename kwarg at the call site)xrspatial/geotiff/_backends/gpu.py(same)Acceptance criteria
_finalize_lazy_read_attrstakes two dtype parameters.test_lazy_finalization_parity_2162.pystill passes unchanged.