Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Losing the battle to convert a string to double #129

Open
vinniefalco opened this issue Nov 16, 2019 · 15 comments
Open

Losing the battle to convert a string to double #129

vinniefalco opened this issue Nov 16, 2019 · 15 comments

Comments

@vinniefalco
Copy link
Contributor

Going from double to string is amazing using this lib, but I am having trouble efficiently and correctly parsing a string into a double. This is what I have, any ideas? Is there some ulfjack magic that could be worked here?

https://github.com/vinniefalco/json/blob/develop/include/boost/json/detail/number.ipp

@ulfjack
Copy link
Owner

ulfjack commented Nov 17, 2019

If you know your input is the "shortest" format of a floating-point value of a given width, then there is in fact a fast algorithm to parse these values. I have some code for that. I'll need to do a bit of work in order to commit it here, but it shouldn't be much (famous last words).

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Nov 17, 2019

@ulfjack There is an amazing library for Rust that parses float/double values precisely and fast

I've ported the moderate path from it to our library for Scala

@ulfjack
Copy link
Owner

ulfjack commented Nov 17, 2019

I'm aware of the article. I think we can do better.

@ulfjack
Copy link
Owner

ulfjack commented Nov 17, 2019

I have pushed some code to the https://github.com/ulfjack/ryu/tree/parsing branch. There are still big holes in the code, but that's the general shape. Note that it can only handle up to ~18 decimal digits (which is fine for input that was formatted as shortest), and I need to double-check that the tables have enough bits for this usage.

@ulfjack
Copy link
Owner

ulfjack commented Nov 18, 2019

I pushed another commit filling in some of the missing pieces.

@ulfjack
Copy link
Owner

ulfjack commented Dec 4, 2019

I pushed a couple of fixes to the branch. Before it can be merged, I need to figure out error handling, make it work on Mac/Win, and double-check that the bit widths are ok. More tests would also be good, but IMO not a blocker for merging into master.

@plokhotnyuk
Copy link
Contributor

@ulfjack please consider using of SWAR techniques for parsing the decimal representation, like here: sirthias/borer#114

ulfjack added a commit that referenced this issue Dec 4, 2019
A limited implementation for up to 17-digit numbers.

See #129.

* Move some d2s intrinsics to d2s_intrinsics.h

* Rudimentary string to double parsing

* Implement more pieces for parsing

- parse exponents
- parse dot
- handle negative exponents
- handle rounding
- add some basic tests for these things

* Fix constants

* Use __builtin_clzll

Thanks to expnkx for finding this.

See #133.

* Allow exponent to start with '+'

* Correctly handle mantissa overflow

* Increase the table size

Denormal numbers require a larger table because the decimal exponent can go down to -326 (we're using the table inversely to how it's originally used for shortest).

* Handle most overflow/underflow situations

* Increase table size some more

* More tests close to the underflow

These are almost exactly halfway between 0 and the smallest representable float.

* Implement error handling

Mention the restrictions on the input, and that this implementation is experimental.

* Add another caveat to the documentation

* Make it compile with -DRYU_ONLY_64_BIT_OPS

This mode was not defining mulShift. We just use the same implementation as if it's not set.

* Use _BitScanReverse64 on Windows

* Rename mulShift to mulShift{32,64}

Since I moved the mulShift implementations to d2s_intrinsics, they conflict with the ones defined in f2s.c. This is only an issue when -DRYU_OPTIMIZE_SIZE is set, but I think it's better to disambiguate anyway.

* Rename max to max32 to avoid name conflicts on MSVC

* MSVC: Always use _BitScanReverse64

The HAS_64_BIT_INTRINSICS macro is false if RYU_ONLY_64_BIT_OPS is set, so using that is incorrect. Instead, only check for _MSC_VER. I can't seem to find a macro to check for existence of __builtin_clzll.
@ulfjack
Copy link
Owner

ulfjack commented Jan 15, 2020

@vinniefalco posted on the PR: "How about 0.X1E1000 where X is a thousand zeroes"

Looks like you found a bug!

@vinniefalco
Copy link
Contributor Author

We were able to fix our number parser:

https://github.com/vinniefalco/json/blob/5aae31dc74d055d84a7f13e438d80cdf6005c670/include/boost/json/detail/impl/number.ipp#L155

However, it fails to produce the same result as stod for some cases. It is off by just one bit in those cases, which I guess is what all of this complicated looking code in s2d.c is for, which is to identify the cases where it needs to round up or down by one bit to find the nearest double? And this is all because converting from a base 10 exponent to a base 2 exponent is lossy?

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Jan 15, 2020

@vinniefalco
Copy link
Contributor Author

Well.... it would have been nice to read that before I struggled to write the code !!!

@nickaleks
Copy link

@vinniefalco @ulfjack I am currently looking at fast string-to-double conversion, and reading this discussion it's not clear whether current ryu implementation is correct, or whether json implementation is good to use. Can one of those solutions be used yet?

@vinniefalco
Copy link
Contributor Author

The JSON implementation seems to work (i.e. passes tests) but it only implements the "fast path" conversion. Which means it can be off by one bit for some strings. And note that it is not officially part of Boost yet, and is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@plokhotnyuk @vinniefalco @ulfjack @nickaleks and others