Summary
polygonize(raster, return_type="geopandas") returns a GeoDataFrame whose
crs attribute is always None, even when the input xr.DataArray carries
CRS information via attrs['crs'], attrs['crs_wkt'], or rioxarray's
raster.rio.crs. Downstream spatial joins, overlays, and reprojections
silently lose georeferencing.
Reproducer
import numpy as np
import xarray as xr
from xrspatial.polygonize import polygonize
data = np.array([[0, 0, 1], [0, 4, 0], [0, 0, 0]], dtype=np.int32)
raster = xr.DataArray(
data,
dims=("y", "x"),
attrs={"crs": "EPSG:4326"},
)
df = polygonize(raster, return_type="geopandas")
print(df.crs) # -> None, expected EPSG:4326
Why this matters
A user who polygonizes a georeferenced raster and writes the GeoDataFrame
to a GeoPackage / Shapefile gets an output with no CRS. Anything that
opens that file in QGIS or any GIS reads "Unknown CRS" and assumes the
data is in raw projection of whatever the project is set to. Reprojections
silently produce wrong coordinates.
Proposed fix
After building the GeoDataFrame in _to_geopandas, attach a CRS derived
from the input raster via the existing _detect_source_crs helper in
xrspatial/reproject/_crs_utils.py (which already checks attrs['crs'],
attrs['crs_wkt'], and raster.rio.crs). Plumb the raster through to
_to_geopandas so the helper can be called.
For return_type="geojson", GeoJSON v1.0 (RFC 7946) only allows WGS84 so
no CRS propagation is added on that path; for return_type="spatialpandas",
spatialpandas does not expose a CRS slot so the patch leaves it alone.
Category
Sweep: metadata
Category: Cat 1 (attrs preservation)
Severity: MEDIUM
Summary
polygonize(raster, return_type="geopandas")returns aGeoDataFramewhosecrsattribute is alwaysNone, even when the inputxr.DataArraycarriesCRS information via
attrs['crs'],attrs['crs_wkt'], or rioxarray'sraster.rio.crs. Downstream spatial joins, overlays, and reprojectionssilently lose georeferencing.
Reproducer
Why this matters
A user who polygonizes a georeferenced raster and writes the GeoDataFrame
to a GeoPackage / Shapefile gets an output with no CRS. Anything that
opens that file in QGIS or any GIS reads "Unknown CRS" and assumes the
data is in raw projection of whatever the project is set to. Reprojections
silently produce wrong coordinates.
Proposed fix
After building the
GeoDataFramein_to_geopandas, attach a CRS derivedfrom the input raster via the existing
_detect_source_crshelper inxrspatial/reproject/_crs_utils.py(which already checksattrs['crs'],attrs['crs_wkt'], andraster.rio.crs). Plumb the raster through to_to_geopandasso the helper can be called.For
return_type="geojson", GeoJSON v1.0 (RFC 7946) only allows WGS84 sono CRS propagation is added on that path; for
return_type="spatialpandas",spatialpandas does not expose a CRS slot so the patch leaves it alone.
Category
Sweep: metadata
Category: Cat 1 (attrs preservation)
Severity: MEDIUM