Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement: Compare with more algorithms in benchmark #28

Open
LowLevelMahn opened this issue Jul 28, 2018 · 7 comments
Open

Improvement: Compare with more algorithms in benchmark #28

LowLevelMahn opened this issue Jul 28, 2018 · 7 comments
Labels
C Affects the C implementation in ryu/.

Comments

@LowLevelMahn
Copy link

LowLevelMahn commented Jul 28, 2018

see Milo Yips / C++ double-to-string conversion benchmark
https://github.com/miloyip/dtoa-benchmark

@ulfjack
Copy link
Owner

ulfjack commented Jul 28, 2018

We'd first have to support fixed-precision output, which the code does not right now (although it shouldn't be hard to add, see also #27).

Generally speaking, this is not a benchmark repository. Maybe it would be better to add a benchmark for Ryu over there, rather than adding a lot of benchmarks here.

@LowLevelMahn
Copy link
Author

Maybe it would be better to add a benchmark for Ryu over there, rather than adding a lot of benchmarks here.

i asked milo about adding your function to his benchmark suite - maybe you can add a reference to his tests (when integrated)

@ulfjack ulfjack added the C Affects the C implementation in ryu/. label Aug 7, 2018
@StephanTLavavej
Copy link
Contributor

I benchmarked my to_chars() implementation powered by Ryu (which differs from upstream in that it's bounds-checked, follows sprintf() conventions for exponent style, and pays additional overhead for chars_format and centralized INF/NAN logic; Ryu itself is strictly faster) against dtoa_milo(). Ryu is 70% to 90% faster, and additionally rounds correctly. See: https://www.reddit.com/r/cpp/comments/9fiko6/fmt_version_52_released_with_up_to_5x_speedups/e5xmve6/

Perhaps this issue should be resolved, since this is indeed not a benchmark repository and it doesn't seem that doing anything else here would be a valuable use of time? While it appears that double-conversion is not the fastest possible Grisu implementation, it has the advantage of matching Ryu's correctly rounded output, and Ryu indeed appears to be the fastest in the world by a wide margin, so the choice of comparison algorithm seems unimportant (except for "promotional" purposes). In my optimization efforts, I've focused on improving Ryu's absolute performance; the relative performance (especially against sprintf()) is merely entertaining rather than useful.

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Sep 14, 2018

I think here should be sets of parameterized benchmarks , at least to track progress through changes and drive experiments.

Jsoniter-scala (one of first adopters of Ryu) has a growing set of benchmarks from the beginning. It helped a lot to drive improvements using different profilers and to spot regressions on new JVMs. Here are results from the latest run:

https://plokhotnyuk.github.io/jsoniter-scala/

E. g. a lot of mitigations where done to avoid 3x times slowdown regression with Graal due missing compiler optimization for division by constant:

oracle/graal#593

@erthink
Copy link

erthink commented May 31, 2019

Take look to my dtoa-benchmark fork.

$ git clone https://github.com/leo-yuriev/dtoa-benchmark
$ cd dtoa-benchmark
$ cmake . && cmake --build .

C/C++ results sorted by RMS:

$ ./d2a-benchmark | grep 'Bench' | LC_ALL=C sort -k 1.70
Benchmarking randomdigit null                 ... [min    1.500ns, rms    1.500ns, max    1.500ns]
Benchmarking randomdigit erthink              ... [min   22.700ns, rms   43.333ns, max   58.500ns]
Benchmarking randomdigit ryu                  ... [min   41.200ns, rms   55.637ns, max   65.700ns]
Benchmarking randomdigit milo                 ... [min   34.400ns, rms   58.060ns, max   71.800ns]
Benchmarking randomdigit emyg                 ... [min   36.100ns, rms   58.945ns, max   72.800ns]
Benchmarking randomdigit floaxie              ... [min   27.100ns, rms   71.038ns, max   94.200ns]
Benchmarking randomdigit grisu2               ... [min   74.700ns, rms   90.063ns, max  107.900ns]
Benchmarking randomdigit doubleconv           ... [min  100.500ns, rms  139.102ns, max  165.800ns]
Benchmarking randomdigit fmt                  ... [min  117.000ns, rms  147.098ns, max  171.800ns]
Benchmarking randomdigit fpconv               ... [min   99.900ns, rms  149.201ns, max  173.100ns]
Benchmarking randomdigit sprintf              ... [min  804.900ns, rms  902.667ns, max  980.300ns]
Benchmarking randomdigit ostrstream           ... [min 1192.300ns, rms 1292.424ns, max 1375.400ns]
Benchmarking randomdigit ostringstream        ... [min 1249.000ns, rms 1358.195ns, max 1451.100ns]
 $ gcc --version
gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
...

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented May 31, 2019

@leo-yuriev could you, please, update your charts for different number of digits to see results of ryu and erthink in them?

@erthink
Copy link

erthink commented May 31, 2019

@plokhotnyuk, today I spent some time to Refine the benchmark to make it easier and more convenient to get a results (I pushed those changes for now).
And noticed that in different modes of the benchmark results 'ryu' and 'erthink' are significantly different.

RandomDigit: Generates 1000 random double values, filtered out +/-inf and nan. Then convert them to limited precision (1 to 17 decimal digits in significand). Finally convert these numbers into ASCII.

Function Min ns RMS ns Max ns Sum ns Speedup
erthink 21.0 183.412 60.0 733.5 21.00
ryu 42.1 237.773 74.7 966.6 15.94
milo 33.5 257.925 77.4 1046.7 14.72
floaxie 27.6 304.451 96.9 1198.3 12.86
emyg 38.4 296.706 86.3 1204.1 12.79
grisu2 71.2 369.677 107.5 1515.9 10.16
doubleconv 90.7 513.139 167.8 2085.4 7.39
fmt 95.7 523.818 150.4 2148.8 7.17
fpconv 99.2 649.942 180.6 2650.6 5.81
sprintf 828.5 3740.842 965.3 15405.2 1.00
ostrstream 1217.2 5354.421 1359.8 22062.9 0.70
ostringstream 1274.9 5628.034 1434.8 23186.3 0.66

The "sequental" testcase: Actually it converts 10000 numbers with random mantissa for each decimal exponent in range 0...16. So, I think we can treats it as a random numbers with some known distribution/bias of an exponent.

Function Min ns RMS ns Max ns Sum ns Speedup
ryu 14.7 90.250 25.8 367.8 21.48
erthink 18.0 135.354 46.4 543.9 14.53
milo 21.0 149.139 46.8 604.0 13.08
emyg 21.1 149.400 46.7 604.9 13.06
grisu2 55.0 285.169 81.0 1168.3 6.76
floaxie 32.8 300.912 95.8 1189.6 6.64
doubleconv 54.5 319.500 106.3 1297.9 6.09
fmt 81.7 413.348 114.7 1696.6 4.66
fpconv 90.4 539.581 156.5 2179.6 3.62
sprintf 230.0 2020.639 709.7 7901.0 1.00
ostrstream 585.7 3429.470 1062.9 13880.7 0.57
ostringstream 571.1 3480.388 1130.1 14048.8 0.56

So, my preliminary findings are:

  • erthink is faster when value could be represented by 12 or less digits (I checked this by changing range of this loop);
  • ryu is faster in the all other cases;

P.S. The "RMS" in tables actually is a sqrt(sum of squares), i.e. has not normalized for number of values..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C Affects the C implementation in ryu/.
Projects
None yet
Development

No branches or pull requests

5 participants