Description
sieve() on the numpy and cupy backends has no memory guard. _label_connected allocates parent, rank, and region_map_flat as int32 arrays plus a float64 result copy — about 20 bytes/pixel of working memory — before any check runs.
The dask paths guard via _available_memory_bytes() (lines 343-355, 366-381). The public sieve() API at line 489 dispatches numpy DataArrays straight into _sieve_numpy with no check. _sieve_cupy at line 308 calls data.get() then _sieve_numpy, inheriting the gap.
A 50000x50000 numpy raster asks for ~50 GB of host memory before anything errors out.
Same asymmetric-guard pattern already fixed in cost_distance #1262, mahalanobis #1288, multispectral #1291, and kde #1287.
Expected behavior
sieve() raises MemoryError with a clear message on every backend when the projected working set exceeds available memory, matching the existing _sieve_dask behavior.
Proposed fix
Add _check_memory(rows, cols) and _check_gpu_memory(rows, cols) helpers (28 bytes/pixel, 50% threshold) and call them from _sieve_numpy and _sieve_cupy before the union-find allocations run.
Followup (separate)
The int32 indices in _label_connected silently truncate when n > 2^31 (rasters above ~46340x46340). The docstring notes the limit but nothing enforces it at runtime. The memory guard rejects rasters that large before the int32 issue triggers, so this is a documentation/clarity follow-up rather than an exploitable bug.
Description
sieve()on the numpy and cupy backends has no memory guard._label_connectedallocates parent, rank, and region_map_flat as int32 arrays plus a float64 result copy — about 20 bytes/pixel of working memory — before any check runs.The dask paths guard via
_available_memory_bytes()(lines 343-355, 366-381). The publicsieve()API at line 489 dispatches numpy DataArrays straight into_sieve_numpywith no check._sieve_cupyat line 308 callsdata.get()then_sieve_numpy, inheriting the gap.A 50000x50000 numpy raster asks for ~50 GB of host memory before anything errors out.
Same asymmetric-guard pattern already fixed in cost_distance #1262, mahalanobis #1288, multispectral #1291, and kde #1287.
Expected behavior
sieve()raisesMemoryErrorwith a clear message on every backend when the projected working set exceeds available memory, matching the existing_sieve_daskbehavior.Proposed fix
Add
_check_memory(rows, cols)and_check_gpu_memory(rows, cols)helpers (28 bytes/pixel, 50% threshold) and call them from_sieve_numpyand_sieve_cupybefore the union-find allocations run.Followup (separate)
The int32 indices in
_label_connectedsilently truncate when n > 2^31 (rasters above ~46340x46340). The docstring notes the limit but nothing enforces it at runtime. The memory guard rejects rasters that large before the int32 issue triggers, so this is a documentation/clarity follow-up rather than an exploitable bug.