Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The library is slower than double-conversion on small integers. #144

Open
alexey-milovidov opened this issue Jan 7, 2020 · 4 comments
Open

Comments

@alexey-milovidov
Copy link

alexey-milovidov commented Jan 7, 2020

About two times slower on numbers in range 0..9, about 1.5 times slower on numbers in range 0..999.

This is important because your library is pretending to be fastest:

At the time of this writing, these are the fastest known float-to-string conversion algorithms.

But to make the statement that something is fast, it's better to test on real applications and on real datasets.

In what applications there is a need to convert massive amounts of floating point numbers to strings? Some of this applications are using double because there is no other choice or because it is the reasonable default choice.

Example: JavaScript, Lua; JSON serialization, CSV and TSV.
In these applications, most of the doubles will be in fact, integers.

Another examples are time series databases. These databases store sensor values for monitoring. Various metrics can be stored in a single table. For example, server CPU usage is usually fractional, but the number of running processes in a server is integer. It's likely that the most of sensor values in some databases are, in fact, integers.

It means, that it's important to keep in mind the usage scenario when most of formatted floats are integers.

Look for more details here: ClickHouse/ClickHouse#8542
I have plugged ryu in ClickHouse and it shows very promising results except for one case.

Good news is that it's easy to fix with small impact on performance of formatting fractional floats!
Example is here: https://github.com/ClickHouse/ClickHouse/pull/8542/files#diff-5b2c0d6b450ea8c072f39c9625094f66R121
Just check that number is integer and do shortcut.

Some numbers on synthetic benchmark
million rows per second: Ryu, double-conversion, Ryu with shortcut for integers:

SELECT count() FROM system.numbers WHERE NOT ignore(toString(toFloat64(number % 2)))
18.81 26.85 123.33
SELECT count() FROM system.numbers WHERE NOT ignore(toString(toFloat64(number % 10)))
11.59 18.86 102.85
SELECT count() FROM system.numbers WHERE NOT ignore(toString(toFloat64(number % 100)))
10.25 16.08 93.77
SELECT count() FROM system.numbers WHERE NOT ignore(toString(toFloat64(number % 1000)))
10.87 14.92 86.98
SELECT count() FROM system.numbers WHERE NOT ignore(toString(toFloat64(number % 10000)))
10.95 13.57 82.56
@ulfjack
Copy link
Owner

ulfjack commented Jan 15, 2020

I had to block @expnkx for abusive behavior. I'm generally happy to add a special case for small integers. This has been on the todo list for a while, but I've been rather busy with other things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@ulfjack @alexey-milovidov and others