Description
rasterize() silently casts the default fill=np.nan to an integer when the resolved output dtype is integer, and emits no _FillValue/nodata/nodatavals attr to mark unwritten pixels.
The cast hands back data whose unwritten pixels look like a legitimate burn value. Downstream code that builds masks off _FillValue (the geotiff writer, rioxarray, qgis) has no sentinel to key off and treats unwritten cells as valid data.
Three paths trip this:
import numpy as np, xarray as xr
from shapely.geometry import box
from xrspatial.rasterize import rasterize
# A) like has integer dtype, default fill=nan
x = np.linspace(0.5, 9.5, 10); y = np.linspace(9.5, 0.5, 10)
like = xr.DataArray(np.zeros((10,10), dtype=np.int16), dims=['y','x'], coords={'y':y,'x':x})
r = rasterize([(box(2,2,8,8), 1.0)], like=like)
# r.values[0,0] == 0 -- looks identical to a real zero burn
# r.attrs == {} -- no nodata, no _FillValue, no nodatavals
# B) explicit integer dtype + default fill=nan
r = rasterize([(box(2,2,8,8), 1.0)], width=10, height=10,
bounds=(0,0,10,10), dtype=np.int32)
# r.values[0,0] == -2147483648 (np.iinfo(int32).min)
# r.attrs == {}
# C) explicit uint8 dtype + default fill=nan
r = rasterize([(box(2,2,8,8), 1.0)], width=10, height=10,
bounds=(0,0,10,10), dtype=np.uint8)
# r.values[0,0] == 0 <-- collides with a legitimate value-0 burn
# r.values[5,5] == 1
# r.attrs == {}
The numpy backend wraps out.astype(dtype) in warnings.catch_warnings() and suppresses the RuntimeWarning: invalid value encountered in cast, so the cast is silent at the user level (xrspatial/rasterize.py:1193-1196). The dask+numpy backend does not suppress the warning, so the same call with chunks= emits the warning but still loses the sentinel — a separate backend-inconsistency (Cat 5).
Expected Behavior
When the user requests an integer dtype output (via dtype= or via like with integer dtype) and leaves fill at the default np.nan, one of:
- Raise
ValueError up front: "fill=np.nan cannot be represented in integer dtype ; pass an explicit integer fill (e.g. fill=0 or a sentinel like fill=-9999) or use a float dtype."
- Or: pick a dtype-appropriate sentinel automatically (e.g.
np.iinfo(dtype).min), emit it in nodata/_FillValue/nodatavals, and document the rule.
Option 1 is the safer default — the user has to make an explicit choice and the call doesn't silently corrupt downstream masks. Option 2 changes behaviour for callers who currently rely on the silent cast (unlikely, but worth flagging).
What I expected
rasterize([...], dtype=np.int32) with default fill=np.nan should either fail loudly or set attrs['_FillValue'] to whatever sentinel actually occupies unwritten pixels, so downstream tools can mask them.
Acceptance criteria
- Calling
rasterize(..., dtype=int_dtype) (or like= with integer dtype) and leaving fill at the default np.nan raises a clear ValueError describing the conflict and pointing to the fix.
- All four backends (numpy, cupy, dask+numpy, dask+cupy) raise consistently before allocating the output.
- The error fires before any host or device allocation (same pattern as
_check_output_dimensions).
- Tests cover the integer-dtype-with-NaN-fill case for every backend.
Description
rasterize()silently casts the defaultfill=np.nanto an integer when the resolved output dtype is integer, and emits no_FillValue/nodata/nodatavalsattr to mark unwritten pixels.The cast hands back data whose unwritten pixels look like a legitimate burn value. Downstream code that builds masks off
_FillValue(the geotiff writer, rioxarray, qgis) has no sentinel to key off and treats unwritten cells as valid data.Three paths trip this:
The numpy backend wraps
out.astype(dtype)inwarnings.catch_warnings()and suppresses theRuntimeWarning: invalid value encountered in cast, so the cast is silent at the user level (xrspatial/rasterize.py:1193-1196). The dask+numpy backend does not suppress the warning, so the same call withchunks=emits the warning but still loses the sentinel — a separate backend-inconsistency (Cat 5).Expected Behavior
When the user requests an integer dtype output (via
dtype=or vialikewith integer dtype) and leavesfillat the defaultnp.nan, one of:ValueErrorup front: "fill=np.nan cannot be represented in integer dtype ; pass an explicit integer fill (e.g.fill=0or a sentinel likefill=-9999) or use a float dtype."np.iinfo(dtype).min), emit it innodata/_FillValue/nodatavals, and document the rule.Option 1 is the safer default — the user has to make an explicit choice and the call doesn't silently corrupt downstream masks. Option 2 changes behaviour for callers who currently rely on the silent cast (unlikely, but worth flagging).
What I expected
rasterize([...], dtype=np.int32)with defaultfill=np.nanshould either fail loudly or setattrs['_FillValue']to whatever sentinel actually occupies unwritten pixels, so downstream tools can mask them.Acceptance criteria
rasterize(..., dtype=int_dtype)(orlike=with integer dtype) and leavingfillat the defaultnp.nanraises a clearValueErrordescribing the conflict and pointing to the fix._check_output_dimensions).