Skip to content

rasterize: silent NaN-to-integer fill cast loses sentinel, no _FillValue emitted #2504

@brendancol

Description

@brendancol

Description

rasterize() silently casts the default fill=np.nan to an integer when the resolved output dtype is integer, and emits no _FillValue/nodata/nodatavals attr to mark unwritten pixels.

The cast hands back data whose unwritten pixels look like a legitimate burn value. Downstream code that builds masks off _FillValue (the geotiff writer, rioxarray, qgis) has no sentinel to key off and treats unwritten cells as valid data.

Three paths trip this:

import numpy as np, xarray as xr
from shapely.geometry import box
from xrspatial.rasterize import rasterize

# A) like has integer dtype, default fill=nan
x = np.linspace(0.5, 9.5, 10); y = np.linspace(9.5, 0.5, 10)
like = xr.DataArray(np.zeros((10,10), dtype=np.int16), dims=['y','x'], coords={'y':y,'x':x})
r = rasterize([(box(2,2,8,8), 1.0)], like=like)
# r.values[0,0] == 0  -- looks identical to a real zero burn
# r.attrs == {} -- no nodata, no _FillValue, no nodatavals

# B) explicit integer dtype + default fill=nan
r = rasterize([(box(2,2,8,8), 1.0)], width=10, height=10,
              bounds=(0,0,10,10), dtype=np.int32)
# r.values[0,0] == -2147483648 (np.iinfo(int32).min)
# r.attrs == {}

# C) explicit uint8 dtype + default fill=nan
r = rasterize([(box(2,2,8,8), 1.0)], width=10, height=10,
              bounds=(0,0,10,10), dtype=np.uint8)
# r.values[0,0] == 0   <-- collides with a legitimate value-0 burn
# r.values[5,5] == 1
# r.attrs == {}

The numpy backend wraps out.astype(dtype) in warnings.catch_warnings() and suppresses the RuntimeWarning: invalid value encountered in cast, so the cast is silent at the user level (xrspatial/rasterize.py:1193-1196). The dask+numpy backend does not suppress the warning, so the same call with chunks= emits the warning but still loses the sentinel — a separate backend-inconsistency (Cat 5).

Expected Behavior

When the user requests an integer dtype output (via dtype= or via like with integer dtype) and leaves fill at the default np.nan, one of:

  1. Raise ValueError up front: "fill=np.nan cannot be represented in integer dtype ; pass an explicit integer fill (e.g. fill=0 or a sentinel like fill=-9999) or use a float dtype."
  2. Or: pick a dtype-appropriate sentinel automatically (e.g. np.iinfo(dtype).min), emit it in nodata/_FillValue/nodatavals, and document the rule.

Option 1 is the safer default — the user has to make an explicit choice and the call doesn't silently corrupt downstream masks. Option 2 changes behaviour for callers who currently rely on the silent cast (unlikely, but worth flagging).

What I expected

rasterize([...], dtype=np.int32) with default fill=np.nan should either fail loudly or set attrs['_FillValue'] to whatever sentinel actually occupies unwritten pixels, so downstream tools can mask them.

Acceptance criteria

  • Calling rasterize(..., dtype=int_dtype) (or like= with integer dtype) and leaving fill at the default np.nan raises a clear ValueError describing the conflict and pointing to the fix.
  • All four backends (numpy, cupy, dask+numpy, dask+cupy) raise consistently before allocating the output.
  • The error fires before any host or device allocation (same pattern as _check_output_dimensions).
  • Tests cover the integer-dtype-with-NaN-fill case for every backend.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiAPI design and consistencybugSomething isn't workinginput-validationInput validation and error messages

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions