Skip to content

[hist] Improve precision of TAxis::FindFixBin / FindBin. #19033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

hageboeck
Copy link
Member

@hageboeck hageboeck commented Jun 13, 2025

The bin computation in TAxis can suffer from floating-point uncertainties, returning bins such that this assertion breaks:

int bin = axis.FindBin(x);
assert(axis.GetBinLowEdge(bin) <= x && x < axis.GetBinUpEdge(bin));

This is of course surprising for users, and sometimes brings data into the overflow, although they fall into the valid range of the axis. Here, two terms are added, which fix these floating-point errors.

On a recent AMD CPU, there is no runtime difference, since the latency of the virtual call overshadows the additional computations for the correction. For 10M calls, running 100 times with perf, we arrive at:

Uncorrected Corrected
Instructions 250M 360M
cycles 216M 218M
Run time 0.0495 +- 0.0003 s 0.0493 +- 0.0002 s

Note:
Several attempts at reordering the equations for better precision failed. There were always a few test cases in the included tests that didn't pass, so applying the correction looks to be the best solution.

This PR fixes the issues brought up here:

@hageboeck hageboeck self-assigned this Jun 13, 2025
@hageboeck hageboeck requested a review from lmoneta as a code owner June 13, 2025 12:46
@hageboeck hageboeck force-pushed the TAxis_numericalPrecision branch 2 times, most recently from 3eb7398 to 50e7024 Compare June 13, 2025 12:59
@ferdymercury
Copy link
Collaborator

ferdymercury commented Jun 13, 2025

Thanks a lot Stephan!
Some questions from my side:

@hageboeck
Copy link
Member Author

Thanks a lot Stephan! Some questions from my side:

Good questions, I'm checking! 🙂

Copy link

github-actions bot commented Jun 13, 2025

Test Results

    19 files      19 suites   4d 3h 1m 0s ⏱️
 3 016 tests  3 013 ✅ 0 💤  3 ❌
55 735 runs  55 708 ✅ 0 💤 27 ❌

For more details on these failures, see this check.

Results for commit 901d041.

♻️ This comment has been updated with latest results.

@hageboeck hageboeck linked an issue Jun 17, 2025 that may be closed by this pull request
1 task
@hageboeck
Copy link
Member Author

It can be closed, indeed.

It fixes the problem described there. I linked the issue.

  • Does this new implementation affect runtime performance when filling a histogram? If yes, maybe it's better to have a separate function FindBinHighRes or so. One could even think of using long-double.

It does not affect run time on a modern CPU, even though more instructions are issued. When you run the bare code to find the bin (not as part of a virtual function), you have the following behaviour:

Uncorrected Corrected
Instructions 8 19
Cycles 33 31

It doesn't matter how many instructions are issued. What's interesting is how many cycles they need to complete. Although that depends on the CPU in use, you see that the two versions are very close.

The killer argument, however, is that all of this happens inside a virtual function, and that of course has its own latencies. When plugging the codes into a virtual function, and benchmarking 10M calls, we arrive at:

Uncorrected Corrected
Instructions 250M 360M
cycles 216M 218M
Run time 0.0495 +- 0.0003 s 0.0493 +- 0.0002 s

So you see that more instructions are issued, but due to the CPU being a superscalar CPU, these additional instructions have no noteworthy effect on the run time, because the correction terms can be evaluated in parallel to the original computation.

This means, that in TAxis, where it is part of a virtual function, we should not see a run time difference.

  • Is it sure that the resulting bin is only off by one? I can imagine of (probably very weird cases) where the floating point limited precision might be two?

I did not manage to provoke a more-than-one difference. I didn't set out to prove it mathematically, but I had lots of difficulties crafting a case that made an "off by -1" error.

@ferdymercury
Copy link
Collaborator

ferdymercury commented Jun 17, 2025

Awesome analysis, thanks a lot!!
Does this mean that I can close #17689 if you implement the same strategy in that part of the code ?
And not sure if you want to review this alleviation strategy: #17896

@ferdymercury
Copy link
Collaborator

Btw, this PR reminds me of this issue: https://its.cern.ch/jira/browse/ROOT-10179

@hageboeck hageboeck force-pushed the TAxis_numericalPrecision branch from b9849ac to 8e6acc0 Compare June 17, 2025 14:06
@hageboeck
Copy link
Member Author

Awesome analysis, thanks a lot!! Does this mean that I can close #17689 if you implement the same strategy in that part of the code ?

@ferdymercury I think for this PR, the x == max --> overflow problem remains. So you could probably just use:

- max = currentMax;
+ max = std::nextafter(currentMax, inf);

but that's to be tested.

Btw, this PR reminds me of this issue: https://its.cern.ch/jira/browse/ROOT-10179

Was this the correct link?

@ferdymercury
Copy link
Collaborator

ferdymercury commented Jun 17, 2025

Was this the correct link?

It was about the question/fact of whether performance was impacted. Maybe more the subissue https://its.cern.ch/jira/browse/ROOT-10185 which would have been easier for you if rootbench could be triggered from a PR.

 max = std::nextafter(currentMax, inf);
    

I had tried that and it didn't work. I really had to add
xmax = std::max(xmax + 1e-15*(xmax - xmin), std::nextafter(xmax,INFINITY));

For most cases, the 1e-15*max_width is what really counts.
Or do you mean that that should be fine with just nextafter, 'after' your PR gets merged?

Related: #17896

Due to the floating-point subtraction x - xMin, the bin index
occasionally flows over to the next bin. This is particularly
annoying when it goes into the overflow, although the coordinate is
strictly smaller than the max of the axis.
By adding a correction term that can detect this case, overflows such as
the one discussed in https://root-forum.cern.ch/t/bug-or-feature-in-ttree-draw/62862
can be avoided.

An error in the other direction is possible, too, and fixed in this
commit:
https://root-forum.cern.ch/t/floating-point-rounding-error-when-filling-the-histogram/35189

Microbenchmarking the changed lines showed them to be the same speed in
gcc, and 40% slower in clang. Both changes are by far outweighted by the
virtual call overhead, though.

This allowed for removing the cautionary note on rounding errors, added
in 1703c54, which is fixed now.

Fix root-project#14091.
The tests are based on two issues brought up in the forum, a few cases
designed by hand to break the old algorithm, github issue root-project#14091, the
tests from PR root-project#14105, and a test sampling 100 random points.
@hageboeck hageboeck force-pushed the TAxis_numericalPrecision branch from 8e6acc0 to 901d041 Compare June 19, 2025 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in TAxis::FindBin (Double_t x) ?
2 participants