xarray-contrib · brendancol · Apr 28, 2026 · Apr 28, 2026 · Apr 28, 2026
diff --git a/.claude/sweep-security-state.csv b/.claude/sweep-security-state.csv
@@ -33,6 +33,7 @@ polygon_clip,2026-04-27,,,,,"Clean. Module is a raster mask-and-clip wrapper --
 proximity,2026-04-22,,,,,"Clean. Public APIs (proximity/allocation/direction) all call _validate_raster. GPU kernel _proximity_cuda_kernel has bounds guard at lines 359-360. Dask KDTree path has explicit memory guards (lines 897-903 result array, 1297-1312 unbounded distance fallback, 681-682 cache budget). Index math uses np.int64 for pan_near_x/pan_near_y, target_counts, y_offsets/x_offsets -- no int32 overflow risk. Target detection filters NaN via np.isfinite (lines 533, 657). _calc_direction guards x1==x2 & y1==y2 before arctan2. No file I/O. LOW (not flagged): line 1235 pad_y/pad_x omit abs() while line 437 uses it -- minor inconsistency, not exploitable."
 rasterize,2026-04-21,1223,HIGH,1;2,,HIGH: unbounded out/written allocation in _run_numpy/_run_cupy driven by user-supplied width/height/resolution (no cap). MEDIUM (unfixed): _build_row_csr_numba total=row_ptr[height] is int32 and can wrap for very tall rasters with many long edges.
 reproject,2026-04-17,,MEDIUM,1;3,,
+resample,2026-04-28,1295,HIGH,1,,"HIGH (fixed #1295): resample() did not bound output dimensions derived from user-supplied scale_factor / target_resolution. _output_shape returns max(1, round(in_h * scale_y)), max(1, round(in_w * scale_x)) and was passed straight through to the eager numpy / cupy backends, where _run_numpy and _run_cupy / the _AGG_FUNCS numba kernels and _nan_aware_interp_np allocated np.empty / cupy.empty / map_coordinates buffers of that size with no memory check. scale_factor=1e9 on a 4x4 raster requested ~190 EB; target_resolution=1e-9 on a meter-scale raster did the same. Fixed by adding _available_memory_bytes() / _available_gpu_memory_bytes() helpers and _check_resample_memory(out_h, out_w) / _check_resample_gpu_memory(out_h, out_w) guards (12 B/cell budget covering float64 working buffer + float32 output + map_coordinates temporary), wired into resample() before backend dispatch. Eager numpy and cupy paths run the guard; dask paths skip it because per-chunk allocations are bounded by chunk size. Mirrors the kde / line_density (#1287), focal (#1284), geodesic (#1283), cost_distance (#1262), and diffuse (#1267) patterns. No other findings: _validate_raster called at line 698, scale_y > 0 / scale_x > 0 enforced, AGGREGATE_METHODS rejects scale > 1.0, identity fast path bypasses dispatch entirely, all numba kernels guard count > 0 before division, no CUDA kernels (cupy paths use cupy ufuncs + cupyx.scipy.ndimage), no file I/O, all backends cast to float64 before computation and float32 on output."
 sieve,2026-04-28,1296,HIGH,1,,"HIGH (fixed #1296): sieve() on numpy and cupy backends had no memory guard. _label_connected allocates parent (int32, 4B/px), rank (int32, 4B/px, reused as root_to_id), region_map_flat (int32, 4B/px), plus a float64 result copy (8B/px) ~ 20 B/pixel of working memory before any check. The dask paths (_sieve_dask line 343 and _sieve_dask_cupy line 366) already raised MemoryError via _available_memory_bytes() at 28 B/pixel budget, but the public sieve() API at line 489 dispatched np.ndarray inputs straight into _sieve_numpy with no guard, and _sieve_cupy at line 308 transferred to host via data.get() then called _sieve_numpy, inheriting the gap. A 50000x50000 numpy raster requested ~50 GB silently. Fixed by extracting _check_memory(rows, cols) and _check_gpu_memory(rows, cols) helpers (mirrors cost_distance #1262 / mahalanobis #1288 / multispectral #1291 / kde #1287 pattern) at 28 B/pixel host budget plus 16 B/pixel GPU round-trip budget at 50% of available memory threshold. _check_memory wired into _sieve_numpy at the top before the float64 copy. _check_gpu_memory wired into _sieve_cupy before data.get(); it also calls _check_memory so the host budget still applies. Consolidated _available_memory_bytes definition (was duplicated). All 47 tests pass including 2 new memory-guard tests for the numpy backend (_sieve_numpy direct call + public sieve() API). No other findings: Cat 2 int32 indexing in _label_connected docstring acknowledges <2.1B pixel limit; the new memory guard rejects rasters that large before the int32 issue can trigger so this is a documentation/clarity follow-up rather than an exploitable bug. Cat 3 NaN handled via valid mask; Cat 4 no CUDA kernels; Cat 5 only /proc/meminfo read; Cat 6 _validate_raster called at line 478."
 viewshed,2026-04-22,1229,HIGH,1,,"HIGH (fixed #1229): _viewshed_cpu allocated ~500 bytes/pixel of working memory (event_list 3*H*W*7*8 bytes + status_values/status_struct/idle + visibility_grid + lexsort temporary) with no guard. A 20000x20000 raster tried to allocate ~200 GB. Fixed by adding peak-memory guard mirroring the _viewshed_dask pattern (_available_memory_bytes() check, raises MemoryError with max_distance= hint). No other HIGH findings: dask path already guarded, _validate_raster is called, distance-sweep uses dtype=float64, _calc_dist_n_grad guards zero distance."
 zonal,2026-04-22,1227,HIGH,1;2;6,,"HIGH (fixed #1227): _stats_cupy used `if nodata_values:` (truthy) so nodata_values=0 silently skipped the filter on the cupy backend, producing wrong stats vs every other backend. MEDIUM (unfixed): _strides uses np.int32 for stride indices — can wrap for arrays > ~2B elements in the numpy path. MEDIUM (unfixed): hypsometric_integral() skips _validate_raster on zones/values; _regions_numpy has no memory guard (numpy-only path, bounded by caller-allocated input). MEDIUM (unfixed): _stats_numpy return_type='xarray.DataArray' allocates np.full((n_stats, values.size)) with no guard."
diff --git a/xrspatial/resample.py b/xrspatial/resample.py
@@ -43,6 +43,90 @@
 # epsilon.
 _INTERP_DEPTH = {'nearest': 1, 'bilinear': 1, 'cubic': 10}
 
+# Approximate working-set size per output cell for the eager backends:
+# one float64 working buffer (8 B) plus a float32 output cell (4 B).
+# scipy.ndimage.map_coordinates also allocates a temporary of the same
+# size during higher-order spline evaluation; the 0.5 * available bound
+# below leaves room for that.
+_BYTES_PER_OUTPUT_CELL = 12
+
+
+# -- Memory guard ------------------------------------------------------------
+
+def _available_memory_bytes():
+    """Best-effort estimate of available host memory in bytes."""
+    # Try /proc/meminfo (Linux)
+    try:
+        with open('/proc/meminfo', 'r') as f:
+            for line in f:
+                if line.startswith('MemAvailable:'):
+                    return int(line.split()[1]) * 1024
+    except (OSError, ValueError, IndexError):
+        pass
+    # Try psutil
+    try:
+        import psutil
+        return psutil.virtual_memory().available
+    except (ImportError, AttributeError):
+        pass
+    # Fallback: 2 GB
+    return 2 * 1024 ** 3
+
+
+def _available_gpu_memory_bytes():
+    """Best-effort estimate of free GPU memory in bytes.
+
+    Returns 0 when CuPy / CUDA is unavailable or the query fails -- callers
+    treat that as a sentinel meaning "no GPU info, skip the guard".
+    """
+    try:
+        import cupy as _cp
+        free, _total = _cp.cuda.runtime.memGetInfo()
+        return int(free)
+    except Exception:
+        return 0
+
+
+def _check_resample_memory(out_h, out_w):
+    """Raise MemoryError if the eager output buffer would exceed RAM.
+
+    The numpy and cupy-eager backends allocate a single (out_h, out_w)
+    float64 working buffer plus a float32 output before any actual work.
+    A user passing a huge ``scale_factor`` (or a tiny ``target_resolution``)
+    would otherwise OOM the process before this function returns.
+    """
+    required = int(out_h) * int(out_w) * _BYTES_PER_OUTPUT_CELL
+    available = _available_memory_bytes()
+    if required > 0.5 * available:
+        raise MemoryError(
+            f"resample output of {out_h}x{out_w} would need "
+            f"~{required / 1e9:.1f} GB of working memory but only "
+            f"~{available / 1e9:.1f} GB is available. "
+            f"Use a smaller scale_factor / larger target_resolution, "
+            f"or pass a dask-backed DataArray for out-of-core processing."
+        )
+
+
+def _check_resample_gpu_memory(out_h, out_w):
+    """Raise MemoryError if the cupy-eager output buffer would exceed VRAM.
+
+    Skips the check (returns silently) when free GPU memory cannot be
+    queried -- the kernel will fail later at the cupy.empty boundary
+    anyway.
+    """
+    available = _available_gpu_memory_bytes()
+    if available <= 0:
+        return
+    required = int(out_h) * int(out_w) * _BYTES_PER_OUTPUT_CELL
+    if required > 0.5 * available:
+        raise MemoryError(
+            f"resample output of {out_h}x{out_w} would need "
+            f"~{required / 1e9:.1f} GB of GPU working memory but only "
+            f"~{available / 1e9:.1f} GB is free on the active device. "
+            f"Use a smaller scale_factor / larger target_resolution, "
+            f"or pass a dask+cupy DataArray for out-of-core processing."
+        )
+
 
 # -- Output-geometry helpers -------------------------------------------------
 
@@ -742,6 +826,21 @@ def resample(
         out.name = name
         return out
 
+    # -- memory guard for eager backends ------------------------------------
+    # Dask paths build per-chunk allocations lazily (chunk size already
+    # bounds peak memory). The eager numpy and cupy paths allocate the
+    # full (out_h, out_w) buffer up front and need an explicit guard.
+    in_h, in_w = agg.shape[-2:]
+    out_h, out_w = _output_shape(in_h, in_w, scale_y, scale_x)
+
+    is_dask = da is not None and isinstance(agg.data, da.Array)
+    is_cupy = cupy is not None and isinstance(agg.data, cupy.ndarray)
+    if not is_dask:
+        if is_cupy:
+            _check_resample_gpu_memory(out_h, out_w)
+        else:
+            _check_resample_memory(out_h, out_w)
+
     # -- dispatch to backend -------------------------------------------------
     mapper = ArrayTypeFunctionMapping(
         numpy_func=_run_numpy,
@@ -752,9 +851,6 @@ def resample(
     result_data = mapper(agg)(agg.data, scale_y, scale_x, method)
 
     # -- build output coordinates -------------------------------------------
-    in_h, in_w = agg.shape[-2:]
-    out_h, out_w = _output_shape(in_h, in_w, scale_y, scale_x)
-
     ydim, xdim = agg.dims[-2], agg.dims[-1]
     y_vals = np.asarray(agg[ydim].values, dtype=np.float64)
     x_vals = np.asarray(agg[xdim].values, dtype=np.float64)

diff --git a/xrspatial/tests/test_resample.py b/xrspatial/tests/test_resample.py
@@ -378,3 +378,55 @@ def test_aggregate_parity(self, numpy_and_cupy_rasters, method):
         cp_out = resample(cp_agg, scale_factor=0.5, method=method)
         np.testing.assert_allclose(cp_out.data.get(), np_out.values,
                                    atol=1e-5, equal_nan=True)
+
+
+# ---------------------------------------------------------------------------
+# Memory guard (#1295)
+# ---------------------------------------------------------------------------
+
+class TestMemoryGuard:
+    """Reject scale factors that would OOM the eager backends."""
+
+    def test_huge_scale_factor_raises(self, grid_4x4):
+        # 4 * 1e9 ~= 4e9 cells per axis -> 1.6e19 cells -> ~190 EB
+        with pytest.raises(MemoryError, match="resample output of"):
+            resample(grid_4x4, scale_factor=1e9, method='nearest')
+
+    def test_huge_target_resolution_inverse_raises(self, grid_4x4):
+        # cellsize=1.0, target_resolution=1e-9 -> ~4e9 cells per axis
+        with pytest.raises(MemoryError, match="resample output of"):
+            resample(grid_4x4, target_resolution=1e-9, method='nearest')
+
+    def test_huge_scale_factor_aggregate_path_unaffected(self, grid_4x4):
+        # Aggregate methods reject scale > 1.0 with ValueError before
+        # the memory guard runs; confirm that error path still wins.
+        with pytest.raises(ValueError, match="only supports downsampling"):
+            resample(grid_4x4, scale_factor=1e9, method='average')
+
+    def test_normal_inputs_unaffected(self, grid_4x4):
+        # Sanity: a normal upsample call still works.
+        out = resample(grid_4x4, scale_factor=2.0, method='nearest')
+        assert out.shape == (8, 8)
+
+    def test_error_message_names_parameters(self, grid_4x4):
+        # The hint should point the user at the parameters they control.
+        with pytest.raises(MemoryError) as excinfo:
+            resample(grid_4x4, scale_factor=1e9, method='bilinear')
+        msg = str(excinfo.value)
+        assert "scale_factor" in msg
+        assert "target_resolution" in msg
+
+    def test_dask_path_skips_guard(self, grid_4x4):
+        # Dask backends build per-chunk allocations lazily -- the guard
+        # should not fire even for shapes that would OOM the eager path.
+        # We only check that the output graph builds; we never compute it.
+        if not dask_array_available():
+            pytest.skip("dask not installed")
+        import dask.array as da
+        dask_agg = grid_4x4.copy()
+        dask_agg.data = da.from_array(grid_4x4.data, chunks=2)
+        # scale_factor=100 -> 400x400 output, well within RAM budget
+        # but still exercises the dask dispatch.  We just want the
+        # guard not to short-circuit a reasonable dask call.
+        out = resample(dask_agg, scale_factor=100.0, method='nearest')
+        assert out.shape == (400, 400)