Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastDoubleParser doesn't support all input formats as the default OpenJDK Float/Double parsers #19

Closed
grcevski opened this issue Jun 24, 2022 · 10 comments

Comments

@grcevski
Copy link

The FastDoubleParser was recently introduced in Jackson through this issue FasterXML/jackson-core#577 is 3-4x times faster compared to the version that's implemented in OpenJDK. This is fantastic news, since many numerical processing workloads would benefit from this.

However the OpenJDK Double/Float parsers support variety of input formats that the FastDoubleParser will fail on, therefore it can cause unexpected regressions when used.

For example, the FastDoubleParser will fail with a NumberFormatException on these example patterns (there are more to be found in the OpenJDK Double/Float tests):

1.1e-23f
0x.003p12f
0x1.17742db862a4P-1d

I think apart from the first one in this list, the rest are all hexadecimal if I'm not mistaken.

@pjfanning
Copy link
Contributor

@grcevski any number prefixed with 0x should be interpreted as hexidecimal - it is a very common convention to make it obvious the number is not plain decimal.

@wrandelshofer
Copy link
Owner

Oh - I didn't know that Double.parseDouble() can parse numbers with a trailing format specifier. I only looked at the lexical syntax rules that they gave in the Javadoc of Double.valueOf(String)

FastDoubleParser can parse the examples that you gave, if you remove the trailing "f" and "d" characters:

1.1e-23
0x.003p12
0x1.17742db862a4P-1

Thank you for the link to the Double/Float test suite. I am going to run it against FastDoubleParser and fix any errors. It is my goal, to have an implementation that can be used as a drop in for java.util.Double.parseDouble(String) and java.util.Float.parseFloat(String).

You are making me very curious though:
JSON does not support trailing format specifiers. Hexadecimal float literals are also not supported. So, it would be an error if jackson-core would let FastDoubleParser parse them. How do you handle these cases in jackson-core?

@wrandelshofer
Copy link
Owner

wrandelshofer commented Jun 24, 2022

Oops, jackson-core is an XML-Parser (not JSON).

https://www.w3.org/TR/xmlschema-2/#double

The lexical rules of XML schema neither allow trailing format specifiers nor hexadecimal float literals. So, I don't know why you would want support for them(?)

@grcevski
Copy link
Author

Thanks for the quick response @wrandelshofer, to be honest I didn't realize it's an issue with trailing format specifiers, if it does work without them I think we are good for jackson-core!

@pjfanning
Copy link
Contributor

Thanks everyone for the input. jackson-core is most commonly used for JSON parsing but there are jackson modules for other formats like XML and YAML (and lots more). JSON doesn't permit hex values - but there is an open issue where someone wants Jackson to support JSON5 which does allow hex values (https://json5.org/).

@wrandelshofer
Copy link
Owner

Okay, I am going to do the following:

  1. Add support for trailing format specifiers to the library.
  2. Run the JDK tests that you linked against FastDoubleParser and fix all bugs.
  3. Change the back-end classes so that they encode errors in a magic return value(which is a primitive long) instead of throwing an Exception.
  4. Make the back-end classes public, so that Jackson can use them directly.
  5. Maybe? Add an entry point that only supports the JSON "number" production.

What specification do you use for the JSON syntax? Is this "ECMA-404", "The JSON data interchange syntax" ?

@pjfanning
Copy link
Contributor

What specification do you use for the JSON syntax? Is this "ECMA-404", "The JSON data interchange syntax" ?

I think ECMA 404 is the right spec to use - confirmed it by looking at https://www.json.org/json-en.html

@cowtowncoder
Copy link

cowtowncoder commented Jun 24, 2022

@wrandelshofer Jackson actually handles lexical part (tokenization) according to JSON spec (or, for a small number of deviations, optionally allowing some alternate cases). Value that would be handed to FDP need not be further verified as long as accepted set of values is a superset, which I think is the case (JSON floating-point value definition is quite strict).
So @pjfanning is correct about values accepted by jackson-core with default settings.

So I think there is no need for alternate end points just for Jackson use. Of course it may be otherwise useful for different use cases, but no need from Jackson perspective.

@wrandelshofer
Copy link
Owner

So far, I did the following:

[done] Add support for trailing format specifiers to the library.
[done] Run the JDK tests that you linked against FastDoubleParser and fix all bugs.
[done] Change the back-end classes so that they encode errors in a magic return value(which is a primitive long) instead of throwing an Exception.
[not done] Make the back-end classes public, so that Jackson can use them directly.
[done] Maybe? Add an entry point that only supports the JSON "number" production.

@grcevski
Copy link
Author

This sounds great @wrandelshofer, I think this resolves the issue. We started using the new jackson 2.14 as soon as it was released, great speedup, I think with your changes this should be broadly applicable and maybe the JDK eventually makes this the default parsing algorithm :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants