Skip to content

GPU float32 COG overview mean diverges from CPU writer (float32 vs float64 accumulation) #2983

@brendancol

Description

@brendancol

Describe the bug

The GPU COG writer's overview "mean" reduction doesn't match the CPU writer on float32 input. _block_reduce_2d_gpu in xrspatial/geotiff/_gpu_decode.py calls cupy.nanmean(blocks, axis=(1, 3)) on a float32 blocks array, so each block mean accumulates in float32. The CPU path (_reduce2x2_mean in _overview_kernels.py, via _block_reduce_2d) accumulates the same block in float64 and downcasts to float32 only on store.

So the two writers emit different overview bytes for float32 rasters. That contradicts the GPU writer's stated contract: _writers/gpu.py says it "produces the same overview bytes as the CPU writer".

To Reproduce

On a 64x64 float32 raster, 307 of 1024 overview pixels differ between the CPU and GPU mean reductions, max absolute difference about 1.2e-4. Only mean on float32 is affected. min, max, and median are exact and already match, and float64 input already matches because both paths accumulate in float64 there.

Expected behavior

GPU overview mean byte-identical to the CPU overview mean for the same input, per the documented parity contract.

Fix

Pass dtype=cupy.float64 to the nanmean call so the GPU accumulates in float64 and downcasts on the existing final astype(arr2d.dtype). Verified byte-identical to CPU across 64x64, odd 33x65, and 128x128 float32 inputs.

Found by the accuracy sweep.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggeotiffGeoTIFF modulegpuCuPy / CUDA GPU support

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions