Describe the bug
The GPU COG writer's overview "mean" reduction doesn't match the CPU writer on float32 input. _block_reduce_2d_gpu in xrspatial/geotiff/_gpu_decode.py calls cupy.nanmean(blocks, axis=(1, 3)) on a float32 blocks array, so each block mean accumulates in float32. The CPU path (_reduce2x2_mean in _overview_kernels.py, via _block_reduce_2d) accumulates the same block in float64 and downcasts to float32 only on store.
So the two writers emit different overview bytes for float32 rasters. That contradicts the GPU writer's stated contract: _writers/gpu.py says it "produces the same overview bytes as the CPU writer".
To Reproduce
On a 64x64 float32 raster, 307 of 1024 overview pixels differ between the CPU and GPU mean reductions, max absolute difference about 1.2e-4. Only mean on float32 is affected. min, max, and median are exact and already match, and float64 input already matches because both paths accumulate in float64 there.
Expected behavior
GPU overview mean byte-identical to the CPU overview mean for the same input, per the documented parity contract.
Fix
Pass dtype=cupy.float64 to the nanmean call so the GPU accumulates in float64 and downcasts on the existing final astype(arr2d.dtype). Verified byte-identical to CPU across 64x64, odd 33x65, and 128x128 float32 inputs.
Found by the accuracy sweep.
Describe the bug
The GPU COG writer's overview "mean" reduction doesn't match the CPU writer on float32 input.
_block_reduce_2d_gpuinxrspatial/geotiff/_gpu_decode.pycallscupy.nanmean(blocks, axis=(1, 3))on a float32blocksarray, so each block mean accumulates in float32. The CPU path (_reduce2x2_meanin_overview_kernels.py, via_block_reduce_2d) accumulates the same block in float64 and downcasts to float32 only on store.So the two writers emit different overview bytes for float32 rasters. That contradicts the GPU writer's stated contract:
_writers/gpu.pysays it "produces the same overview bytes as the CPU writer".To Reproduce
On a 64x64 float32 raster, 307 of 1024 overview pixels differ between the CPU and GPU mean reductions, max absolute difference about 1.2e-4. Only
meanon float32 is affected.min,max, andmedianare exact and already match, and float64 input already matches because both paths accumulate in float64 there.Expected behavior
GPU overview mean byte-identical to the CPU overview mean for the same input, per the documented parity contract.
Fix
Pass
dtype=cupy.float64to thenanmeancall so the GPU accumulates in float64 and downcasts on the existing finalastype(arr2d.dtype). Verified byte-identical to CPU across 64x64, odd 33x65, and 128x128 float32 inputs.Found by the accuracy sweep.