Skip to content

polygon_clip: rasterize mask ignores dask chunks, materializes full array #1207

@brendancol

Description

@brendancol

Describe the bug

clip_polygon() calls rasterize() without passing chunks, so the mask is always built as a full numpy array regardless of the input raster's backend. For a dask-backed input, this materializes the entire mask in RAM before wrapping it back into a dask array.

The code at polygon_clip.py:205-211 sets like=raster but never extracts the chunk structure. The rasterize() dispatch (rasterize.py:2158-2178) only uses _run_dask_numpy when chunks is explicitly provided, so the call falls through to _run_numpy and allocates a dense array.

For a 30TB raster this means the mask alone would need hundreds of GB of RAM.

Expected behavior

When the input raster is dask-backed, clip_polygon() should pass the raster's chunk sizes to rasterize() so the mask stays lazy. Similarly for dask+cupy inputs.

Affected code

polygon_clip.py:205-211

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingoomOut-of-memory risk with large datasetsperformancePR touches performance-sensitive code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions