Skip to content

[BUG] Numerical inaccuracy in summation based routines #379

Open
@krachyon

Description

@krachyon

Describe the bug

Bottleneck's implementation of algorithms containing a summation yields different results than numpy for floats. This seems to stem from the fact that numpy uses some sort of compensated summation algorithm to increase accuracy, while bottleneck uses a straight sum, e.g.:
bottleneck/src/reduce_template.c

FOR {
    const npy_DTYPE0 ai = AI(DTYPE0);
    if (!bn_isnan(ai)) {
        asum += ai;
    }

To Reproduce

import numpy as np
import bottleneck

# adding float32.eps to 2.f gives 2.f so e.g. Kahan-summation is needed to get result != 2.f
arr = np.hstack(([np.float32(2.)], np.repeat(np.finfo(np.float32).eps, 100000).astype(np.float32)))
print('numpy: ', np.nansum(arr))
print('bottleneck: ', bottleneck.nansum(arr))
numpy:  2.011919
bottleneck:  2.0

System:
Linux-5.11.11-arch1-1-x86_64-with-glibc2.33
Python 3.9.2 (default, Feb 20 2021, 18:40:11)
[GCC 10.2.0]
bottleneck 1.3.2

Expected behavior
As implementations can be switched due to non-obvious reasons (like a fallback to numpy routines in the case of non-native byteorder), results between bottleneck-routines and numpy should match.
If a complete match of results is not attainable, the documentation should state clearly that bottleneck does not always reproduce numpy results.

Additional context
astropy/astropy#11492

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions