Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number_converter is slow #422

Closed
Crzyrndm opened this issue Nov 17, 2019 · 7 comments
Closed

number_converter is slow #422

Crzyrndm opened this issue Nov 17, 2019 · 7 comments
Assignees

Comments

@Crzyrndm
Copy link
Collaborator

Crzyrndm commented Nov 17, 2019

Related to #421
It was noticed during benchmarking that, at least with MSVC 16 (VS 2019), number_converter::stold made a very significant impact on load times (20% throughput was gained by switching to strtod). number_converter internally uses istringstream

strtod will break when the set locale uses ',' as the decimal point, so it isn't an option, but it shows that there is need for an alternative to the current string->double routine. strtod_l is a widely supported extension (POSIX 2002 (?)) that would fit the requirements

  • MSVC 2015+ uses _strtod_l in the PR, also non-standard, but documented and microsoft goes out of their way not to break back-compat
  • POSIX platforms are less clear. The main issue is the platforms can't seem to agree on which headers to include for the necessary bits. As such, they still use the original number_converter implementation

The best solution to this would be an external library which either just does locale independent double parsing (note: this is a hard problem, many libs have issues), or wraps all the platform nastiness away. A mediocre solution would be to detect known platforms and #ifdef in the appropriate strtod_l machinery leaving the slow number_converter as the fallback for unknown toolchains (like has been done in the linked PR).

@tfussell
Copy link
Owner

Thanks for looking into this. My original focus was on correctness over performance, but with such significant gains possible from small changes, this is probably worth looking more into. I always liked the nlohmann/json codebase as they have high quality code and many of the same problems as xlnt. Looks like their approach for this problem was basically to convert the decimal separator into a comma depending on the locale and then use strtod. Here's the PR nlohmann/json#450

@tfussell tfussell self-assigned this Nov 17, 2019
@paulharris
Copy link

paulharris commented Nov 17, 2019 via email

@Crzyrndm
Copy link
Collaborator Author

Crzyrndm commented Nov 18, 2019

@tfussell
That sounds much less hacky than what strtod_l very quickly turned into. I will see if I can whip something up

@paulharris
the primary issue with any external lib dealing with floats (parsing or formatting) is correctness. I know enough to know that I don't know enough, hence I'm not keen on straying outside C / std / system libraries.

The other option that is part of std is C++17's to_chars / from_chars, but only MSVC has implemented so far AFAIK. Hence that would still require preprocessor to detect (luckily, std feature macros are well documented). I will test std::from_chars and see if the trickery is worthwhile, but I don't expect it will be on the same order as dropping istringstream (it could be quite noticeable if we factor in having to scan for comma's though...)

@Crzyrndm
Copy link
Collaborator Author

Pushed the suggested change in #421 as it cleans up quite a hacky section
Somewhat surprisingly, for the happy case at least (parsing in a '.' locale), it is even faster than using _strtod_l was

@Crzyrndm
Copy link
Collaborator Author

Crzyrndm commented Nov 18, 2019

Testing MSVC std::from chars dropped another 1-2%, which would indicate to me that you're unlikely to see massive gains

as it happens, xlnt has issues with the locale not set to use '.' as the decimal point. There are a handful of locations during parsing where xml::parser::attribute<double> are used, which internally just use stringstream without imbuing the "C" locale, so...

@Crzyrndm
Copy link
Collaborator Author

Crzyrndm commented Nov 18, 2019

and fixing those pushed the bench maybe 1-2% the other way using the "de-DE" (german) locale. Should probably add a test so that doesn't break again... (EDIT: test added, test removed, CI doesn't know about loacels, and I can't be bothered right now)

@tfussell
Copy link
Owner

I think this is fixed now. Going to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants