Where
xrspatial/geotiff/_writers/gpu.py:498 in write_geotiff_gpu's COG overview generation loop.
What
After each call to make_overview_gpu, the loop rewrites NaN cells back to the nodata sentinel via:
current = make_overview_gpu(current, method=overview_resampling, nodata=nodata)
cumulative_factor *= 2
if (nodata is not None
and np.dtype(str(current.dtype)).kind == 'f'
and not np.isnan(float(nodata))):
nan_mask = cupy.isnan(current)
if bool(nan_mask.any().item()):
current = current.copy()
current[nan_mask] = np.dtype(
str(current.dtype)).type(nodata)
make_overview_gpu returns a freshly allocated cupy buffer at every call site:
- 2D path:
_block_reduce_2d_gpu returns the result of cupy.nanmean / cupy.nanmin / cupy.around(...).astype(...) / cropped[::2, ::2].copy() / cupy.asarray(cpu_result) -- all fresh allocations.
- 3D path:
cupy.stack(bands, axis=2) -- fresh allocation.
Nothing else aliases the buffer between the make_overview_gpu return and the in-place NaN rewrite, so the explicit current = current.copy() allocates a second chunk-sized GPU buffer just to mutate it.
Why it matters
For an 8192x8192 float32 raster with 4 auto-generated overview levels, the extra allocations sum to roughly:
(2048x2048 + 1024x1024 + 512x512 + 256x256) * 4 bytes ~= 21 MB
per write. Modest, but the pattern is identical to the one fixed in #1934 for _apply_nodata_mask_gpu: replace the copy() + indexed write with cupy.putmask(current, nan_mask, sentinel) so the existing buffer is mutated in place and one chunk-sized device allocation per overview level is skipped.
Suggested fix
current = make_overview_gpu(current, method=overview_resampling, nodata=nodata)
cumulative_factor *= 2
if (nodata is not None
and np.dtype(str(current.dtype)).kind == 'f'
and not np.isnan(float(nodata))):
nan_mask = cupy.isnan(current)
if bool(nan_mask.any().item()):
cupy.putmask(current, nan_mask,
np.dtype(str(current.dtype)).type(nodata))
The fix mirrors the in-place sentinel rewrite already used on the freshly-allocated GPU buffer at line 426 in the NaN->sentinel branch above, and the _apply_nodata_mask_gpu rewrite in #1934.
Severity
LOW. Sentinel-poisoning on a multi-level pyramid is the only path that hits it, and the allocations are bounded by overview size (smaller than the full raster), but the fix is a one-line cupy.putmask substitution with the same correctness contract.
Where
xrspatial/geotiff/_writers/gpu.py:498inwrite_geotiff_gpu's COG overview generation loop.What
After each call to
make_overview_gpu, the loop rewrites NaN cells back to the nodata sentinel via:make_overview_gpureturns a freshly allocated cupy buffer at every call site:_block_reduce_2d_gpureturns the result ofcupy.nanmean/cupy.nanmin/cupy.around(...).astype(...)/cropped[::2, ::2].copy()/cupy.asarray(cpu_result)-- all fresh allocations.cupy.stack(bands, axis=2)-- fresh allocation.Nothing else aliases the buffer between the
make_overview_gpureturn and the in-place NaN rewrite, so the explicitcurrent = current.copy()allocates a second chunk-sized GPU buffer just to mutate it.Why it matters
For an 8192x8192 float32 raster with 4 auto-generated overview levels, the extra allocations sum to roughly:
per write. Modest, but the pattern is identical to the one fixed in #1934 for
_apply_nodata_mask_gpu: replace thecopy()+ indexed write withcupy.putmask(current, nan_mask, sentinel)so the existing buffer is mutated in place and one chunk-sized device allocation per overview level is skipped.Suggested fix
The fix mirrors the in-place sentinel rewrite already used on the freshly-allocated GPU buffer at line 426 in the NaN->sentinel branch above, and the
_apply_nodata_mask_gpurewrite in #1934.Severity
LOW. Sentinel-poisoning on a multi-level pyramid is the only path that hits it, and the allocations are bounded by overview size (smaller than the full raster), but the fix is a one-line
cupy.putmasksubstitution with the same correctness contract.