-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Author of Proposal: Community request
Reason or problem
You have a 30TB dask-backed raster. You want to see what it looks like. You can't .compute() it because you have 16GB of RAM. So users write throwaway slicing hacks that lose coordinates and mishandle NaN.
Proposal
Add a preview() function that downsamples a DataArray (or Dataset) to a target pixel size (e.g. 1000x1000) without blowing memory.
Design:
Use xarray's coarsen with block averaging. For dask arrays this stays lazy: each chunk gets reduced on its own, so peak memory is just the largest chunk plus the output array. Numpy/cupy arrays already fit in memory, so same operation, nothing special needed.
Accepts both xr.DataArray and xr.Dataset. For Datasets, each data variable is independently downsampled via the existing @supports_dataset decorator, then collected back into a smaller Dataset. Same memory guarantees apply.
Backend support:
- NumPy:
coarsen().mean() - CuPy: stride-based subsampling (xarray coarsen has edge cases with cupy)
- Dask+NumPy:
coarsen().mean(), lazy, won't OOM - Dask+CuPy:
coarsen().mean(), lazy, stays on GPU
Returns a small xr.DataArray (or xr.Dataset) with the coordinates downsampled to match.
Usage:
import xarray as xr
import xrspatial
big = xr.open_zarr("huge_dem.zarr")["elevation"] # 30TB dask array
small = xrspatial.preview(big, width=1000) # ~8MB output
small.plot()
# Also works with Datasets
ds = xr.open_zarr("huge_dem.zarr") # multiple variables
small_ds = xrspatial.preview(ds, width=1000)Stakeholders and impacts
Anyone working with large dask rasters. No changes to existing functions.
Drawbacks
Block averaging blurs fine detail. That's the tradeoff with any downsampling for preview.
Alternatives
canvas_like()exists but needs datashader and materializes the full array.[::stride_y, ::stride_x]slicing works but loses coordinates and doesn't average.
Unresolved questions
- Whether to support other aggregation methods (min, max, median) beyond mean.