Large number of incorrect parsing #9

lemire · 2021-03-23T14:16:36Z

For https://github.com/fastfloat/fast_float, we have extensive tests. I have run them through on FastDoubleParser and found many failures, I have collected them in this gist...

https://gist.github.com/lemire/641a34589c36747f6d24ed6d29ac75f0

The algorithm at https://github.com/fastfloat/fast_float handles all of these cases correctly.

You may refer to https://arxiv.org/abs/2101.11408 or to the C# port at https://github.com/CarlVerret/csFastFloat

wrandelshofer · 2021-03-23T17:29:47Z

Thank you for the test cases!
Most of them were caused by wrong treatment of unsigned longs in the code. I have fixed those.
I am going to look at the remaining ones next.

wrandelshofer · 2021-03-23T18:34:05Z

Another class of inputs were wrong because the code did not compensate for skipped digits if there are more than 19 digits.

lemire · 2021-03-23T18:59:32Z

For long inputs, the recommended approach is as follows:

https://github.com/fastfloat/fast_float/blob/main/include/fast_float/parse_number.h#L111-L116

Also in C# at

https://github.com/CarlVerret/csFastFloat/blob/0281cc1bd6f617fa4e1741e4f8c60ceb9fc33fdf/csFastFloat/FastDoubleParser.cs#L380-L390

See section 11 in https://arxiv.org/pdf/2101.11408.pdf

wrandelshofer · 2021-03-27T13:24:31Z

Thank you very much for the pointers to the corresponding code sections!

3be8131 includes now the code that you have marked in https://github.com/fastfloat/fast_float/blob/main/include/fast_float/parse_number.h#L111-L116.

When that fails, the code currently falls back to java.lang.Double.parseDouble(), so that all test cases should pass now.

I am planning to replace all fall back calls to Double.parseDouble() in upcoming commits.

For decimal floating point literals, I believe, it is best to port the parse_long_mantissa function from your C++ code, and everything that comes with it. Your class "decimal" appears to do the stuff that I need. (I tinkered with class java.math.BigDecimal. But it round-trips to a String which it then feeds into Double.parseDouble for doing the conversion - which kind of defeats the purpose).

I am not sure how to implement the corresponding cases for hexadecimal floating point literals. In this case, we have a number that is composed of 1^sign * mantissa * 2^exponent. I tried with the class java.math.BigInteger, but memory usage was too high. Thats why in 3be8131 the code currently only tries Clingers fast path, and then immediately falls back to Double.parseDouble.

wrandelshofer · 2021-04-02T15:40:05Z

This is fixed in ac66003

This revision includes a port of the Decimal class from the C++ code in fast_float. However, it turns out that the code in OpenJDK class jdk.internal.math.FloatingDecimal is by at least by one order of magnitudes faster. So performance is better if method parseRestOfDecimalFloatLiteralTheHardWay() in FastDoubleParser just calls Double.parseDouble().

lemire mentioned this issue Mar 23, 2021

Double.parseDouble(...) != FastDoubleParser.parseDouble(...) #7

Closed

wrandelshofer added a commit that referenced this issue Mar 23, 2021

Fixes many error cases reported in issue #9.

345820b

wrandelshofer closed this as completed Apr 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large number of incorrect parsing #9

Large number of incorrect parsing #9

lemire commented Mar 23, 2021

wrandelshofer commented Mar 23, 2021

wrandelshofer commented Mar 23, 2021

lemire commented Mar 23, 2021

wrandelshofer commented Mar 27, 2021

wrandelshofer commented Apr 2, 2021

Large number of incorrect parsing #9

Large number of incorrect parsing #9

Comments

lemire commented Mar 23, 2021

wrandelshofer commented Mar 23, 2021

wrandelshofer commented Mar 23, 2021

lemire commented Mar 23, 2021

wrandelshofer commented Mar 27, 2021

wrandelshofer commented Apr 2, 2021