Skip to content

cost_distance: missing memory guard on CuPy backend #1262

@brendancol

Description

@brendancol

Describe the bug

PR #1253 added _check_memory(height, width) to the numpy path of cost_distance so oversize rasters raise MemoryError before the Dijkstra kernel allocates ~37 bytes/pixel of working memory. The cupy backend runs the same allocation pattern with no such guard.

_cost_distance_cupy (xrspatial/cost_distance.py:349-409) allocates:

  • cp.full((height, width), cp.inf, dtype=cp.float64) at line 367 (8 bytes/pixel)
  • source_mask and passable boolean masks (~2 bytes/pixel)
  • cp.where(...).astype(cp.float32) output and intermediates (~12 bytes/pixel)

A 100000x100000 cupy raster requests ~80 GB just for dist. That exceeds GPU memory on every consumer card and most workstation cards. What you actually see is a cupy allocator error that does not mention max_cost= or dask. The numpy path raises a clean MemoryError with both hints; cupy should do the same.

The dask+cupy map_overlap path calls _cost_distance_cupy per chunk, so a guard inside _cost_distance_cupy also covers per-chunk allocations there. The dask+cupy unbounded fallback converts to dask+numpy and picks up the existing guard.

Expected behavior

cost_distance on an oversize CuPy raster raises MemoryError with a message pointing to max_cost= and dask backends, like the numpy path already does. No raw CUDA OOM, no opaque cupy allocator error.

Reproduce

import cupy as cp
import xarray as xr
import numpy as np
from xrspatial import cost_distance

H, W = 100000, 100000  # ~80 GB float64
src = cp.zeros((H, W), dtype=cp.float64)
src[0, 0] = 1.0
fric = cp.ones((H, W), dtype=cp.float64)

raster = xr.DataArray(src, dims=['y', 'x'])
friction = xr.DataArray(fric, dims=['y', 'x'])
raster['y'] = np.arange(H)
raster['x'] = np.arange(W)
friction['y'] = np.arange(H)
friction['x'] = np.arange(W)

cost_distance(raster, friction)  # GPU OOM, no clean error

Additional context

Found during a security audit of cost_distance. The module is used for routing and path planning, so an opaque OOM here can mask real failures in batch pipelines. The fix mirrors PR #1253: add a _check_gpu_memory helper using cupy.cuda.runtime.memGetInfo() and raise MemoryError before the first cp.full allocation when the working footprint exceeds 50% of free GPU RAM.

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggpuCuPy / CUDA GPU supportinput-validationInput validation and error messagesoomOut-of-memory risk with large datasetsproximity toolsProximity, allocation, direction, cost distance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions