Skip to content

Streaming write of monolithic TIFF from dask input #1084

@brendancol

Description

@brendancol

Author of Proposal: @brendan

Reason or Problem

to_geotiff calls .compute() on dask-backed DataArrays, materializing the
full raster into RAM before writing. For rasters that don't fit in memory, this
makes to_geotiff unusable unless you go through the VRT tiled output path
(see #1083).

Proposal

Add a streaming write path for monolithic .tif output from dask inputs.
Write tiles sequentially as each dask chunk computes, then seek back to patch
the IFD offset/bytecount arrays. This is how GDAL handles it internally.

This is a follow-up to the VRT output work in #1083. The VRT path solves the
immediate problem for most users, but some workflows genuinely need a single
file.

Design: Not yet specced. Needs investigation into how the current
_writer.py assembles IFDs and whether the two-pass approach (write tiles,
then patch offsets) can be grafted onto the existing code without a rewrite.

Drawbacks

  • Adds complexity to the writer for a use case the VRT path already covers.
  • TIFF offset patching is fiddly and easy to get wrong.

Alternatives

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestoomOut-of-memory risk with large datasetsperformancePR touches performance-sensitive code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions