Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .claude/sweep-security-state.csv
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ hillshade,2026-04-27,,,,,"Clean. Cat 1: only allocation is the output np.empty(d
hydro,2026-04-17,,MEDIUM,1;3;6,,
kde,2026-04-27,1287,HIGH,1,,"HIGH (fixed #1287): kde() and line_density() accepted user-controlled width/height with no upper bound. The eager numpy and cupy backends allocated np.zeros((height, width), dtype=float64) (or cupy.zeros) up front (kde.py: _run_kde_numpy line 308, _run_kde_cupy line 314, line_density inline at line 706). width=1_000_000, height=1_000_000 requested ~8 TB of float64 (or VRAM on the GPU path) before any check ran. Fixed by adding local _available_memory_bytes() helper (mirrors convolution/morphology/bump pattern) and _check_grid_memory(rows, cols) that raises MemoryError when rows*cols*8 exceeds 50% of available RAM. Wired into kde() (skipped for dask paths since _run_kde_dask_numpy/_run_kde_dask_cupy build per-tile via da.from_delayed and are bounded by chunk size) and line_density() (single numpy backend, always guarded). Error message names width/height so the caller knows which knob to turn. No other HIGH findings: Cat 2 (no int32 flat-index math, numba range loops are int64), Cat 3 (bandwidth <= 0 rejected, Silverman fallback returns 1.0 when sigma==0, NaN coords clamp to empty range via min/max), Cat 4 (_kde_cuda has 'if r >= rows or c >= cols: return' bounds guard at line 254, no shared memory, each thread writes own pixel), Cat 5 (no file I/O), Cat 6 (template only used for shape/coords, output dtype forced to float64). MEDIUM (unfixed, Cat 6): _validate_template only checks DataArray + ndim; does not call _validate_raster, but template dtype does not affect compute correctness here."
mahalanobis,2026-04-27,1288,HIGH,1,,"HIGH (fixed #1288): mahalanobis() had no memory guard. Both _compute_stats_numpy/_compute_stats_cupy and _mahalanobis_pixel_numpy/_mahalanobis_pixel_cupy materialise float64 buffers of shape (n_bands, H*W) -- the np.stack at line 45/80, the reshape+transpose at line 184 (which forces a contiguous BLAS copy), the centered diff, and the diff @ inv_cov result are all live at peak. A 100kx100k 5-band raster projected to ~400 GB of host memory just for the stack. Fixed by adding _available_memory_bytes()/_available_gpu_memory_bytes() (mirroring cost_distance.py:261-292) plus _check_memory/_check_gpu_memory at 32 bytes/cell/band budget, and wiring them into the public mahalanobis() entry point before any np.stack runs. Eager paths (numpy, cupy) are guarded; dask paths skip the check because chunks are bounded by user-supplied chunksize. MEDIUM (unfixed, Cat 6): mahalanobis() does not call _validate_raster on each band -- validate_arrays only enforces matching shape and array-type, so boolean / non-numeric DataArrays silently coerce. Deferred to a separate PR per the security-sweep one-fix-per-PR policy. No other HIGH findings: Cat 2 (no int32 indexing, numpy default int64), Cat 3 (singular covariance raises a clean ValueError, dist_sq is clamped to 0 before sqrt to absorb numerical noise, NaN mask propagates correctly), Cat 4 (no CUDA kernels), Cat 5 (no file I/O beyond /proc/meminfo)."
mcda,2026-04-29,1311,HIGH,3,,Cat 3: NaN/Inf weights silently pass _validate_weights (combine.py:35-39) and owa order_weights check (combine.py:154-158) because abs(NaN-1.0) > 0.01 is False; produces all-NaN raster. Same shape of bug in ahp_weights (weights.py:94) where val<=0 lets NaN slip past. Fixed in #1311 with explicit np.isfinite checks. MEDIUM Cat 1 noted: sensitivity._monte_carlo eagerly computes full dask Dataset; combine.owa stacks all criteria via xr.concat without size guard. MEDIUM Cat 3 noted: sensitivity n_samples=0 divides by zero; wpm permits zero-base/negative-weight without bounds check. No CUDA kernels (Cat 4 N/A); no file I/O (Cat 5 N/A); no int32 index math (Cat 2 N/A).
morphology,2026-04-24,1256,HIGH,1,,"HIGH (fixed #1256): morph_erode/morph_dilate/morph_opening/morph_closing/morph_gradient/morph_white_tophat/morph_black_tophat accepted a user-supplied kernel with only shape/dtype/odd-size validation. Kernel dimensions drove np.pad/cp.pad on every backend and map_overlap depth on dask paths; a 99999x99999 kernel on a 1000x1000 raster would try to allocate ~80 GB of padded float64 memory with no warning. Fixed by adding local _available_memory_bytes() helper and _check_kernel_memory(rows, cols, ky, kx) that raises MemoryError before allocation when padded size exceeds 50% of available RAM; wired into _dispatch() so every public API entry point is guarded across all four backends. Mirrors bilateral #1236, convolution #1241, bump #1231. No other HIGH findings: Cat 2 (loop indices are Python ints, numba promotes to int64), Cat 3 (NaN propagation explicit via v!=v in both numpy and CUDA paths, tests verify), Cat 4 (GPU kernels _erode_gpu/_dilate_gpu have if i<rows and j<cols bounds guards, no shared memory), Cat 5 (no file I/O), Cat 6 (_validate_raster called in _dispatch, all backends cast to float64 before kernel)."
multispectral,2026-04-27,1291,HIGH,1,1293,"HIGH (fixed PR #1292): true_color() stacked three same-shape bands into an (H, W, 4) RGBA float64 cube on numpy/cupy backends with no memory check; a 100k x 100k true-color call would request ~320 GB before any error. Fixed by adding _available_memory_bytes / _available_gpu_memory_bytes helpers and _check_true_color_memory / _check_true_color_gpu_memory budget checks (24 bytes/pixel, 50% of available RAM/VRAM threshold) wired into _true_color_numpy and _true_color_cupy; mirrors the dasymetric/cost_distance/diffusion pattern. Dask paths skipped because they build the cube lazily. 151/151 tests pass including 4 new memory-guard tests. Other findings clean: 10 CUDA kernels all have bounds guards (per-pixel index math, no stencil); every per-index public function (NDVI/EVI/SAVI/ARVI/GCI/NDMI/NBR/NBR2) calls _validate_raster on each band and validate_arrays for shape match; division denominators in normalized-difference indices are guarded by NaN propagation; no int32 overflow paths; no file I/O. MEDIUM follow-up #1293 (Cat 6): true_color() does not call validate_arrays(r, g, b) to enforce equal band shapes -- separate PR per the one-fix-per-security-PR policy."
normalize,2026-04-27,,,,,"Clean. Both rescale and standardize handle the constant-raster failure mode explicitly in every backend: rescale guards data_range == 0, standardize guards std == 0. Empty-finite-mask case handled. NaN/Inf passthrough is explicit via np.isfinite. Tests cover constant rasters, all-NaN, single cell, inf passthrough, and cross-backend parity. Cat 1: only output-shape np.empty plus a finite-only copy in numpy/cupy paths (~3x input size at peak) -- standard pattern, no user-controlled amplifier. Cat 2: no flat-index math, no height*width arithmetic. Cat 3: division by zero and divide-by-NaN both guarded; integer-dtype path verified working (range scaling correct, in contrast to the perlin failure mode #1232). Cat 4 N/A: no CUDA kernels. Cat 5 N/A: no file I/O. Cat 6: _validate_raster called on inputs (lines 164, 303); _validate_scalar on numeric params; output uniformly np.float64."
Expand All @@ -39,6 +40,7 @@ sieve,2026-04-28,1296,HIGH,1,,"HIGH (fixed #1296): sieve() on numpy and cupy bac
slope,2026-04-28,,,,,"Clean. slope() validates input via _validate_raster (line 383) and _validate_boundary (line 389). Cat 1: planar _cpu/_run_cupy allocate output matching input shape; geodesic paths build (3,H,W) float64 stacked array but are gated by _check_geodesic_memory(rows, cols) at line 410 (already fixed under geodesic audit, PR #1285). Cat 2: no int32 flat-index math; all loops 2D with range(). Cat 3: NaN propagates through arctan in planar kernels; geodesic delegates to _local_frame_project_and_fit which has explicit NaN guards and degenerate det check. Cat 4: _run_gpu (line 146) uses combined bounds+stencil guard 'i-di>=0 and i+di<H and j-dj>=0 and j+dj<W'; geodesic GPU kernels imported from geodesic.py and audited there; _geodesic_cuda_dims uses 16x16 blocks to avoid register spill. Cat 5: no file I/O. Cat 6: all backends cast explicitly to float32 (planar) or float64 (geodesic); lat/lon cast to float64 in _extract_latlon_coords."
terrain,2026-04-28,,MEDIUM,1;3,,"No CRITICAL/HIGH findings. generate_terrain accepts a template DataArray (agg) so the OUTPUT raster is already allocated by the caller before dispatch -- this provides a natural memory bound that distinguishes terrain from purely-generative modules like bump.py. _validate_raster is called at line 580. Cat 1 MEDIUM (unfixed): _gen_terrain (numpy line 41) and _terrain_gpu (line 237) allocate 5-7 same-shape float32 scratch arrays per call (x/y meshgrid, warp_x/warp_y when warp_strength>0, weight in ridged mode, w_noise in worley mode, plus a tmp buffer on GPU); for a 30000x30000 caller-allocated template (~7 GB float64) the marginal scratch footprint is ~14-28 GB without a memory check. The dask + worley path (line 197 _terrain_dask_numpy, line 466 _terrain_dask_cupy) calls dask.persist(raw_worley) which materialises the entire dask array in worker memory before computing min/max for the worley_norm_range pre-pass, defeating chunked processing when worley_blend>0. Cat 2 N/A: no flat-index math in this module (perlin/worley kernels live elsewhere). Cat 3 MEDIUM (unfixed): octaves is bounded >= 1 (line 594) but lacunarity and persistence are not validated; lacunarity**i with user-supplied octaves could overflow to inf for large octaves -- exploitable only via explicit large octaves which is also bounded by user. Division by w_ptp guarded at line 110 (numpy) and line 364 (gpu). Division by warp_norm at line 67 / 184 / 289 / 451 is safe because persistence**0=1 always contributes >= 1 to the sum and warp_octaves >=1 in practice (no validation but caller-controlled). Cat 4 N/A: no CUDA kernels owned by this module; perlin/worley GPU kernels are imported from sibling modules. Cat 5 N/A: no file I/O. Cat 6: _validate_raster runs on agg; noise_mode validated against {'fbm','ridged'}; numeric scalars (seed, zfactor, lacunarity, persistence, warp_strength, warp_octaves, worley_blend, worley_seed) accepted without _validate_scalar but mostly cosmetic since arithmetic naturally rejects bad types. MEDIUM not fixed per task instructions."
sky_view_factor,2026-04-28,1299,HIGH,1,,"Unbounded numpy/cupy allocation; fixed via _check_memory and _check_gpu_memory guards (16 B/pixel, 50% threshold). Dask paths skip the guard."
slope,2026-04-28,,,,,"Clean. slope() validates input via _validate_raster (line 383) and _validate_boundary (line 389). Cat 1: planar _cpu/_run_cupy allocate output matching input shape; geodesic paths build (3,H,W) float64 stacked array but are gated by _check_geodesic_memory(rows, cols) at line 410 (already fixed under geodesic audit, PR #1285). Cat 2: no int32 flat-index math; all loops 2D with range(). Cat 3: NaN propagates through arctan in planar kernels; geodesic delegates to _local_frame_project_and_fit which has explicit NaN guards and degenerate det check. Cat 4: _run_gpu (line 146) uses combined bounds+stencil guard 'i-di>=0 and i+di<H and j-dj>=0 and j+dj<W'; geodesic GPU kernels imported from geodesic.py and audited there; _geodesic_cuda_dims uses 16x16 blocks to avoid register spill. Cat 5: no file I/O. Cat 6: all backends cast explicitly to float32 (planar) or float64 (geodesic); lat/lon cast to float64 in _extract_latlon_coords."
terrain,2026-04-28,,MEDIUM,1;3,,"No CRITICAL/HIGH findings. generate_terrain accepts a template DataArray (agg) so the OUTPUT raster is already allocated by the caller before dispatch -- this provides a natural memory bound that distinguishes terrain from purely-generative modules like bump.py. _validate_raster is called at line 580. Cat 1 MEDIUM (unfixed): _gen_terrain (numpy line 41) and _terrain_gpu (line 237) allocate 5-7 same-shape float32 scratch arrays per call (x/y meshgrid, warp_x/warp_y when warp_strength>0, weight in ridged mode, w_noise in worley mode, plus a tmp buffer on GPU); for a 30000x30000 caller-allocated template (~7 GB float64) the marginal scratch footprint is ~14-28 GB without a memory check. The dask + worley path (line 197 _terrain_dask_numpy, line 466 _terrain_dask_cupy) calls dask.persist(raw_worley) which materialises the entire dask array in worker memory before computing min/max for the worley_norm_range pre-pass, defeating chunked processing when worley_blend>0. Cat 2 N/A: no flat-index math in this module (perlin/worley kernels live elsewhere). Cat 3 MEDIUM (unfixed): octaves is bounded >= 1 (line 594) but lacunarity and persistence are not validated; lacunarity**i with user-supplied octaves could overflow to inf for large octaves -- exploitable only via explicit large octaves which is also bounded by user. Division by w_ptp guarded at line 110 (numpy) and line 364 (gpu). Division by warp_norm at line 67 / 184 / 289 / 451 is safe because persistence**0=1 always contributes >= 1 to the sum and warp_octaves >=1 in practice (no validation but caller-controlled). Cat 4 N/A: no CUDA kernels owned by this module; perlin/worley GPU kernels are imported from sibling modules. Cat 5 N/A: no file I/O. Cat 6: _validate_raster runs on agg; noise_mode validated against {'fbm','ridged'}; numeric scalars (seed, zfactor, lacunarity, persistence, warp_strength, warp_octaves, worley_blend, worley_seed) accepted without _validate_scalar but mostly cosmetic since arithmetic naturally rejects bad types. MEDIUM not fixed per task instructions."
viewshed,2026-04-22,1229,HIGH,1,,"HIGH (fixed #1229): _viewshed_cpu allocated ~500 bytes/pixel of working memory (event_list 3*H*W*7*8 bytes + status_values/status_struct/idle + visibility_grid + lexsort temporary) with no guard. A 20000x20000 raster tried to allocate ~200 GB. Fixed by adding peak-memory guard mirroring the _viewshed_dask pattern (_available_memory_bytes() check, raises MemoryError with max_distance= hint). No other HIGH findings: dask path already guarded, _validate_raster is called, distance-sweep uses dtype=float64, _calc_dist_n_grad guards zero distance."
visibility,2026-04-28,,,,,"Clean. line_of_sight (line 190) and cumulative_viewshed (line 259) call _validate_raster; visibility_frequency delegates. Cat 1: cumulative_viewshed allocates int32 accumulator (4 B/px) but delegates per-observer to viewshed() which has 500 B/px memory guard at viewshed.py:1523-1531; viewshed will fail first on oversize rasters. _bresenham_line (line 35) and _los_kernel (lines 112-143) bounded by transect length (<=W+H+1). Cat 2: int64 throughout, no int32 overflow path. Cat 3: divisions in _los_kernel guarded (D==0 in _fresnel_radius_1 line 87, distance[i]==0 continue line 133, total_dist>0 check line 123); NaN elevation at observer cell would taint los_height but is a correctness not DoS concern. Cat 4: no CUDA kernels. Cat 5: no file I/O. Cat 6: elevations cast to float64 in _extract_transect line 79."
Expand Down
43 changes: 43 additions & 0 deletions xrspatial/mcda/combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,30 @@ def _validate_criteria(criteria: xr.Dataset) -> None:
raise ValueError("criteria Dataset has no variables")


def _check_finite_weights(weights: dict[str, float], label: str) -> None:
"""Reject weight dicts that contain NaN or infinite values.

Without this guard, a single NaN weight slips past the sum-to-1 check
(``abs(NaN - 1.0) > 0.01`` is False) and propagates through every
pixel, returning an all-NaN raster with no error.
"""
bad = []
for name, value in weights.items():
try:
v = float(value)
except (TypeError, ValueError):
raise ValueError(
f"{label}[{name!r}] must be a real number, got {value!r}"
)
if not np.isfinite(v):
bad.append((name, v))
if bad:
details = ", ".join(f"{n!r}={v}" for n, v in bad)
raise ValueError(
f"{label} must be finite; got non-finite value(s): {details}"
)


def _validate_weights(
weights: dict[str, float], criteria: xr.Dataset
) -> None:
Expand All @@ -32,6 +56,7 @@ def _validate_weights(
raise ValueError(
f"Weights contain keys not in criteria Dataset: {extra}"
)
_check_finite_weights(weights, "weights")
total = sum(weights.values())
if abs(total - 1.0) > 0.01:
raise ValueError(
Expand Down Expand Up @@ -151,6 +176,24 @@ def owa(
f"order_weights length ({len(order_weights)}) must match "
f"number of criteria ({n})"
)
# Reject NaN/Inf entries before the sum check, since
# ``abs(NaN - 1.0) > 0.01`` is False and would let the bad value through.
bad_ow = []
for i, value in enumerate(order_weights):
try:
v = float(value)
except (TypeError, ValueError):
raise ValueError(
f"order_weights[{i}] must be a real number, got {value!r}"
)
if not np.isfinite(v):
bad_ow.append((i, v))
if bad_ow:
details = ", ".join(f"[{i}]={v}" for i, v in bad_ow)
raise ValueError(
f"order_weights must be finite; got non-finite value(s): "
f"{details}"
)
ow_sum = sum(order_weights)
if abs(ow_sum - 1.0) > 0.01:
raise ValueError(
Expand Down
18 changes: 15 additions & 3 deletions xrspatial/mcda/weights.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,14 +91,26 @@ def ahp_weights(
f"Self-comparison ({a!r}, {b!r}) is not allowed; "
f"diagonal entries are always 1"
)
if val <= 0:
try:
v = float(val)
except (TypeError, ValueError):
raise ValueError(
f"Comparison value must be a real number, got {val!r} "
f"for ({a!r}, {b!r})"
)
if not np.isfinite(v):
raise ValueError(
f"Comparison value must be finite, got {val} "
f"for ({a!r}, {b!r})"
)
if v <= 0:
raise ValueError(
f"Comparison value must be positive, got {val} "
f"for ({a!r}, {b!r})"
)
i, j = idx[a], idx[b]
matrix[i, j] = val
matrix[j, i] = 1.0 / val
matrix[i, j] = v
matrix[j, i] = 1.0 / v

# Principal eigenvector
eigenvalues, eigenvectors = np.linalg.eig(matrix)
Expand Down
Loading