Skip to content

geotiff: read_geotiff_gpu(chunks=...) is not actually out-of-core #1876

@brendancol

Description

@brendancol

Describe the bug

read_geotiff_gpu(chunks=...) and open_geotiff(gpu=True, chunks=...) both decode the entire raster into a single CuPy array first and call .chunk() on the resulting DataArray afterward (xrspatial/geotiff/__init__.py:3328). The docstring at __init__.py:2740 advertises "out-of-core GPU pipelines" via chunks=, which implies lazy decoding.

For a raster that does not fit in GPU memory, the eager materialization runs out of memory before the user ever sees a Dask graph. The chunking is purely cosmetic past that point.

Expected behavior

Either:

  1. Decode lazily by building a Dask graph that calls per-chunk GPU decoders, so peak GPU memory is bounded by chunk size.
  2. Be honest about it in the docstring and emit a runtime warning when a chunks= value is small relative to the raster, so users do not assume out-of-core semantics.

Option 2 is the pragmatic short-term fix. Option 1 is real lazy GPU I/O, a larger project.

Additional context

Severity: HIGH for users with rasters near GPU memory size. The promise in the docstring is doing the load-bearing work; until that is either delivered on or retracted, chunks= is a foot-gun.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationImprovements or additions to documentationgpuCuPy / CUDA GPU supporthigh-priorityoomOut-of-memory risk with large datasets

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions