Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #72

Merged
merged 1 commit into from
Jun 27, 2016
Merged

Update README.md #72

merged 1 commit into from
Jun 27, 2016

Conversation

jimon
Copy link
Contributor

@jimon jimon commented Jun 22, 2016

No offence, but the lib is actually very slow, so please don't mislead people, for more details see this benchmark.

No offence, but the lib is actually very slow, so please don't mislead people, for more details see this benchmark https://github.com/miloyip/nativejson-benchmark.
@coveralls
Copy link

coveralls commented Jun 22, 2016

Coverage Status

Coverage remained the same at 95.629% when pulling 9a27b5f on jimon:patch-1 into 9e56032 on tgockel:master.

@tgockel
Copy link
Owner

tgockel commented Jun 27, 2016

You're not wrong :-)

@tgockel tgockel merged commit 12988f1 into tgockel:master Jun 27, 2016
@jimon jimon deleted the patch-1 branch June 27, 2016 16:35
@venediktov
Copy link
Contributor

I see different -O3 , -O5 optimization used in packages compared by the "benchmark guy".
I mean , is it really guaranteed that same compiler flags and same optimization levels used for all of them?
I only briefly looked at thirdparty folder which references more then 50 libraries and all of them have Makefile or Cmake with different flags.
Apples vs Oranges ?

@jimon
Copy link
Contributor Author

jimon commented Jun 27, 2016

@tgockel Honestly, respect man!
image
Hope you will manage to optimize the library in a long run 👍

@venediktov as far as I can see all tested libs are built with exactly the same arguments and linked in one binary, see build/premake5.lua there.

@venediktov
Copy link
Contributor

@jimon do you know if those tests are performed on multiple threads?
http://www.cplusplus.com/forum/general/74026/
this library using std::stringstream and that is for sure not performance friendly class when it comes to threads.
Also, as an observation how those tests are written:
1.) https://github.com/miloyip/nativejson-benchmark/blob/master/src/tests/jsoncpptest.cpp
2.) https://github.com/miloyip/nativejson-benchmark/blob/master/src/tests/voorheestest.cpp
3.) https://github.com/miloyip/nativejson-benchmark/blob/master/src/tests/rapidjsontest.cpp

so, 1.) and 2.) are coded in a way that it makes a copy of std::string
3.) is not using std::string at all and StringBuffer as a member of RapidjsonStringResult is passed in by reference and just populated by Writer .
I mean to be fair the interfaces should be the same to test performance.

I would suggest to add similar type of writers/readers to this library and then redo the test by changing
https://github.com/miloyip/nativejson-benchmark/blob/master/src/tests/voorheestest.cpp

So, again to me it's Apple vs. Oranges

@tgockel
Copy link
Owner

tgockel commented Jun 28, 2016

I can't read premake to save my life :-)

From what I've personally benchmarked, there are a few problems with performance, but the biggest one is the use of regular expressions in the parsing code. There is an issue for this, but I haven't gotten around to implementing it.

Another issue is the interaction I have between tokenization and parsing. While this is considered "good practice," the way I wrote the tokenizer eventually requires copying, which is stupid. I did this so that you wouldn't have to have the entire source input in memory at once, but this choice has been hugely detrimental to performance. I've been trying to get the best of both worlds (performance of an all-in-memory solution but not have to have it all in memory), but I haven't come up with a good solution for that.

Anyhow, the ideal function to use for performance is the parse which takes a string_view, as it uses a zero-copy std::streambuf...it still has to copy to the tokenizer, but it's an improvement. Still sucks, though.

Ultimately, it might be best to switch to using something like yacc/bison or Boost Spirit to do the parsing, since they're proven, solid and fast. I don't really care what the solution is, as long as there aren't any external dependencies for the end-user.

@venediktov
Copy link
Contributor

venediktov commented Jun 28, 2016

@tgockel I would look at why this json used in all the benchmarks
https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/canada.json
is parsed by voorhees 10 times slower then
https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/citm_catalog.json

Even though the size of a file 20% different , the only big difference in canada.json compared to citm_catalog.json is it holds 'one huge polygon structure' 99% of file size.
This is just an observation, but it could be that some parts of a library have an issue and must be addressed.

@tgockel
Copy link
Owner

tgockel commented Jun 28, 2016

With oprofile, the issue is that match_number (inside of token_patterns.cpp) takes all the time, since there are so many numbers. This wouldn't be a big deal if it wasn't for my use of regex inside of that function (and if it happens to hit on the edge of a tokenizer buffer, there are 3 regex passes). While this works, it's definitely the cause of slowness. I'd like to fix it, but I don't want to break correctness...it's way easier for me to verify the correctness of a regular expression and I'm not 100% confident my unit test suite covers all possible cases. Just needs a bit of time to fix. Most of my energy has been spent on the serialization stuff, since that was the biggest problem when I started this library :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants