Summary
zonal_stats, zonal_apply, and zonal_crosstab require a pre-rasterized zones DataArray. If you're working with vector data, you have to call rasterize() yourself first:
zones_raster = rasterize(gdf, like=values, column='zone_id')
result = stats(zones=zones_raster, values=values)
This should be one step:
result = stats(zones=gdf, values=values, column='zone_id')
# or via accessor
result = values.xrs.zonal_stats(gdf, column='zone_id')
Reason
Rasterize-then-stats is the most common two-step pattern in our notebooks and user code. The zonal functions should accept vector zones directly and handle the rasterization internally.
Proposal
When zones is a GeoDataFrame (or list of (geometry, value) pairs), the zonal functions call rasterize(zones, like=values, ...) internally before proceeding. The raster-input path doesn't change.
Changes
zonal.py: Add _maybe_rasterize_zones() helper. Checks if zones is a GeoDataFrame or list-of-pairs; if so, calls rasterize(zones, like=values, column=column) and returns the rasterized result. Otherwise returns zones as-is.
stats(), apply(), crosstab(), crop(): Call _maybe_rasterize_zones() at the top, before validation.
- Add a
column parameter to each function (only used for vector input).
accessor.py: Update zonal_stats, zonal_apply, zonal_crosstab, crop to pass through column.
- Forward rasterize kwargs (
all_touched, chunks, etc.) via **rasterize_kw.
Usage
import geopandas as gpd
from xrspatial.zonal import stats
gdf = gpd.read_file('districts.shp')
result = stats(zones=gdf, values=elevation, column='district_id')
# accessor style
result = elevation.xrs.zonal_stats(gdf, column='district_id')
# list-of-pairs (value is the zone ID)
pairs = [(polygon_a, 1), (polygon_b, 2)]
result = stats(zones=pairs, values=elevation)
Drawbacks
- Implicit rasterization could surprise users who expect exact vector boundaries. The docstring should note that results depend on raster resolution.
column parameter only applies to vector input. Passing it with raster zones should raise.
Alternatives
- Leave it as-is. Users call
rasterize() explicitly. Simpler API, more boilerplate.
- Separate
zonal_stats_vector() function. Avoids overloading zones but fragments the API.
Summary
zonal_stats,zonal_apply, andzonal_crosstabrequire a pre-rasterizedzonesDataArray. If you're working with vector data, you have to callrasterize()yourself first:This should be one step:
Reason
Rasterize-then-stats is the most common two-step pattern in our notebooks and user code. The zonal functions should accept vector zones directly and handle the rasterization internally.
Proposal
When
zonesis a GeoDataFrame (or list of(geometry, value)pairs), the zonal functions callrasterize(zones, like=values, ...)internally before proceeding. The raster-input path doesn't change.Changes
zonal.py: Add_maybe_rasterize_zones()helper. Checks ifzonesis a GeoDataFrame or list-of-pairs; if so, callsrasterize(zones, like=values, column=column)and returns the rasterized result. Otherwise returnszonesas-is.stats(),apply(),crosstab(),crop(): Call_maybe_rasterize_zones()at the top, before validation.columnparameter to each function (only used for vector input).accessor.py: Updatezonal_stats,zonal_apply,zonal_crosstab,cropto pass throughcolumn.all_touched,chunks, etc.) via**rasterize_kw.Usage
Drawbacks
columnparameter only applies to vector input. Passing it with raster zones should raise.Alternatives
rasterize()explicitly. Simpler API, more boilerplate.zonal_stats_vector()function. Avoids overloadingzonesbut fragments the API.