Skip to content

kriging: unbounded N×N and grid×N allocations have no memory guard #1307

@brendancol

Description

@brendancol

Describe the bug

kriging() takes an arbitrary point count N and template grid size with no upper bound. Three eager allocations scale with these inputs and run before any guard.

Concrete sites in xrspatial/interpolate/_kriging.py:

  • _experimental_variogram line 73: np.triu_indices(n, k=1) allocates two int64 arrays of length N*(N-1)/2, plus dists and semivar of the same length. At N=50_000 that's about 20 GB before any computation runs.
  • _build_kriging_matrix lines 137-142: builds the N×N pairwise distance matrix and the (N+1)×(N+1) kriging matrix, then calls np.linalg.inv(). O(N²) memory and O(N³) time.
  • _kriging_predict line 178: builds a (grid_pixels, N+1) k0 matrix. A 5000×5000 template with 1000 points works out to about 200 GB.

A caller with a moderately large point set or template hits one of these silently. The process either OOMs or the cupy allocator raises a raw error that says nothing about which input was too large.

Same pattern as the fixes in #1287 (kde), #1295 (resample), #1296 (sieve), #1299 (sky_view_factor), and #1302 (landforms).

The dask backends still call _kriging_predict per chunk, so chunk size bounds the prediction allocation. The variogram and kriging-matrix builds run once on the full point set and need a guard regardless of backend.

Expected behavior

kriging() raises MemoryError before allocating, with a message that names the offending input (point count N or grid pixel count) so the user knows what to reduce.

Reproduce

import numpy as np
import xarray as xr
from xrspatial.interpolate import kriging

n = 100_000
x = np.random.rand(n)
y = np.random.rand(n)
z = np.random.rand(n)
template = xr.DataArray(np.zeros((10, 10)), dims=['y', 'x'],
                       coords={'y': np.arange(10), 'x': np.arange(10)})
kriging(x, y, z, template)

Fix sketch

  1. Estimate worst-case memory at the top of kriging(): max of variogram pair arrays (3 * N*(N-1)/2 * 8), kriging matrix (2 * (N+1)² * 8), and prediction k0 matrix (grid_pixels * (N+1) * 8).
  2. Compare against 0.8 * _available_memory_bytes() using the helper imported from xrspatial.zonal, same pattern as balanced_allocation.py.
  3. Raise MemoryError naming N and grid_pixels when the estimate exceeds the threshold.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinginput-validationInput validation and error messagesoomOut-of-memory risk with large datasets

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions