Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigDecimal parser #24

Closed
pjfanning opened this issue Oct 24, 2022 · 12 comments
Closed

BigDecimal parser #24

pjfanning opened this issue Oct 24, 2022 · 12 comments

Comments

@pjfanning
Copy link
Contributor

Thanks for all the hard work on the double and float parsers. Would there be any chance that you could consider adding support for BigDecimal parsing? A lot of the low level parser could be reused.

@wrandelshofer
Copy link
Owner

Yeah, should be feasible. 🤔

@wrandelshofer
Copy link
Owner

I am working on it. 😅

So far, the code in the dev-Branch is able to parse BigDecimals that have a significand of up to 536,870,919 digits.

However, a BigDecimal can have up to 1,292,782,621 digits.
I haven't figured out how to parse that many digits yet. So, more work is needed.

@wrandelshofer
Copy link
Owner

I have now substantially refactored the code. It supports now BigDecimals with up to 1,292,782,621 digits.

The code uses 4 different algorithms:

  • less than 19 digits: directly computes a long value and then creates a BigDecimal
  • less than 128 digits: iteratively computes a BigSignificand(which is a mutable BigInteger) and then creates a BigDecimal
  • less than 1024 digits: recursively calls the first 2 algorithms
  • 1024 or more digits: recursively calls the first 2 algorithms in a RecursiveTask on the common ForkJoinPool

The API is provided by static methods in the class JavaBigDecimalParser.

BigDecimal b = JavaBigDecimalParser.parseBigDecimal("-1.5e3");

I have made different versions of the code that take advantage of new JDK features.
The branches are named java8, java11, java17, java19.
The main branch is just a copy of the java17 branch.
The dev branch contains code that runs with preview-features of Java 20. All other branches only use stable API methods.

The code still needs more testing, and some cleanup.

The performance should be at least the same as BigDecimal. On the java19 branch, I get the following results with the test file canada.txt:

java.math.BigDecimal        :   282.19 MB/s (+/- 7.2 % stdv) (+/- 1.0 % conf,    512 trials)    16.22 Mfloat/s    61.67 ns/f
JavaBigDecimalParser String :   403.56 MB/s (+/-10.6 % stdv) (+/- 1.0 % conf,   1088 trials)    23.19 Mfloat/s    43.12 ns/f
JavaBigDecimalParser char[] :   576.22 MB/s (+/-11.7 % stdv) (+/- 1.0 % conf,   1344 trials)    33.11 Mfloat/s    30.20 ns/f
JavaBigDecimalParser byte[] :   666.52 MB/s (+/- 8.7 % stdv) (+/- 1.0 % conf,    736 trials)    38.30 Mfloat/s    26.11 ns/f

Speedup JavaBigDecimalParser String vs java.math.BigDecimal: 1.43
Speedup JavaBigDecimalParser char[] vs java.math.BigDecimal: 2.04
Speedup JavaBigDecimalParser byte[] vs java.math.BigDecimal: 2.36

@wrandelshofer
Copy link
Owner

I am still finding bugs in the code, so please use it for evaluation & testing only.

I am pondering whether the JavaBigDecimalParser class should provide separate methods for single-threaded parsing and multi-threaded parsing. Multi-threaded parsing is about twice as fast, but requires a lot more memory.

For this reason, in the Java API they decided to provide two different multiplication methods for BigInteger:

BigIntgeger.multiply()
BigInteger.parallelMultiply()
Also see the interesting discussion in the pull request for parallelMultiply: openjdk/jdk#6409

Do you have any thoughts on this?

@wrandelshofer wrandelshofer changed the title BegDecimal parser BigDecimal parser Nov 17, 2022
@pjfanning
Copy link
Contributor Author

In jackson-core, the fast number parsing is not the default - standard JRE number parsing is used - we will continue to warn users about potential differences when using fast parsers.

Parallel thread parsing sounds generally useful but I suspect that it is not a priority on the jackson side. If you do add the support, we will consider supporting it.

@wrandelshofer
Copy link
Owner

wrandelshofer commented Nov 18, 2022

Is there a need for the non-throwing parser methods? These methods return null instead of throwing a NumberFormatException. The exception is very helpful for debugging.

@pjfanning
Copy link
Contributor Author

@wrandelshofer for the jackson-core use case, NumberFormatException is best solution for invalid inputs - we currently use JRE classes and would like the new parser code to behave similarly.

I can see the benefit of having alternative methods that do not throw exceptions. Exceptions are expensive (generating the stacktrace, etc).

@wrandelshofer
Copy link
Owner

wrandelshofer commented Nov 19, 2022

I have removed the non-throwing code. Debugging the code is very hard without having exceptions with stack trace.
I will take a look at this again, when I find the time to integrate a double parser into a tokenizer.

@wrandelshofer
Copy link
Owner

The dev branch contains now a JavaBigDecimalParser class that provides parseBigDecimal() and parallelParseBigDecimal() methods. You can choose which one you want to use, depending on your use case.

@wrandelshofer
Copy link
Owner

wrandelshofer commented Nov 19, 2022

Here is a performance comparison showing the speedup with parallel parsing against non-parallel parsing. The times are given in nanoseconds.

#Digits  java.math.BigDecimal JavaBigDecimalParser.parse JavaBigDecimalParser.parallelParse
1 13 16 11
10 24 20 17
100 496 699 552
1'000 14'949 7'077 6'279
10'000 1'186'260 308'202 198'469
100'000 116'006'219 10'185'539 3'970'292
1'000'000 11'915'720'193 324'529'829 94'821'977
10'000'000 1'322'734'437'937 7'606'351'419 2'518'137'053
100'000'000   220'240'847'844 82'159'761'598
trendlines  135.98e^(4.5812x) 264.41e^(3.4375x) 281.05e^(3.1973x)

The performance is exponential to the number of digits N: O(e^N)

EDIT 2022-12-23 In the trendlines, x is computed as follows: x = log10(#Digits) + 1

@pjfanning
Copy link
Contributor Author

@wrandelshofer I'm using v0.5.2 and have found that JavaBigIntegerParser,parseBigInteger(CharSequence str) accepts hex values like "AAAA" but new BigInteger(String) throws a NumberFormatException with "AAAA".

Would it be possible to support being able to disable hex support?

@wrandelshofer
Copy link
Owner

I am closing this issue, because there is now a BigDecimal parser in this project. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants