Research on using custom stack instead of call-stack in Reader #35

miloyip · 2014-06-30T16:23:55Z

Currently, ParseObject(), ParseArray() and ParseValue() will recursively call other parse functions. This has potential stack overflow problem if a JSON tree is very deep (maybe a synthesized JSON for security attack).

Research changing these recursive call using custom stack.
Evaluate the performance impact
May add a configurable limit of tree depth.

thebusytypist · 2014-07-08T16:19:44Z

Currently ParseErrorCode set is not complete.

By a explicit state machine implementation(branch TransitionTable), all errors can be captured on every transition to the error state.

thebusytypist · 2014-07-11T05:48:34Z

Progress report:

Most of the functions have been implemented. Except that "May add a configurable limit of tree depth". I am considering adding a configurable limit on parsing stack size instead.

The state transitions are through unittested.

I also have a brief performance test for current implementation(36434b6).

On a i5 2400, 4GB Windows 7 x64 machine, with release32 build:

RapidJson.ReaderParse_DummyHandler_SSE42 (590 ms)
RapidJson.ReaderParseInsitu_DummyHandler_SSE42 (543 ms)
RapidJson.ReaderParseIterative_DummyHandler_SSE42 (656 ms)
RapidJson.ReaderParseIterativeInsitu_DummyHandler_SSE42 (503 ms)

I do not have a good explanation for these results. I guess change of memory access pattern may be the major cause.

A significant change is that the iterative parser will push state to the internal stack(GenericReader::stack_), which was only used for string parsing(GenericReader::ParseString) before.

I tried to separate the state stack, but did not see any impact on performance.

pah · 2014-07-11T07:09:55Z

On an Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, Linux 64-bit, GCC 4.8, I see the following:

[ OK ] RapidJson.ReaderParseInsitu_DummyHandler_SSE42 (550 ms)
[ OK ] RapidJson.ReaderParseIterativeInsitu_DummyHandler_SSE42 (410 ms)
[ OK ] RapidJson.ReaderParse_DummyHandler_SSE42 (803 ms)
[ OK ] RapidJson.ReaderParseIterative_DummyHandler_SSE42 (744 ms)

So it seems to be quite compiler-specific.

(The ordering of the tests could be more convenient to compare the variants, though).

Iterative Parsing (for issue #35)

miloyip added the enhancement label Jul 1, 2014

miloyip assigned thebusytypist Jul 1, 2014

thebusytypist added a commit to thebusytypist/rapidjson that referenced this issue Jul 6, 2014

Try to resolve issue Tencent#35: implement iterative parsing.

3006fa7

miloyip added the performance label Jul 15, 2014

thebusytypist mentioned this issue Jul 17, 2014

Iterative Parsing (for issue #35) #76

Merged

miloyip added a commit that referenced this issue Jul 18, 2014

Merge pull request #76 from thebusytypist/TransitionTable

19a2279

Iterative Parsing (for issue #35)

miloyip closed this as completed Jul 28, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research on using custom stack instead of call-stack in Reader #35

Research on using custom stack instead of call-stack in Reader #35

miloyip commented Jun 30, 2014

thebusytypist commented Jul 8, 2014

thebusytypist commented Jul 11, 2014

pah commented Jul 11, 2014

Research on using custom stack instead of call-stack in Reader #35

Research on using custom stack instead of call-stack in Reader #35

Comments

miloyip commented Jun 30, 2014

thebusytypist commented Jul 8, 2014

thebusytypist commented Jul 11, 2014

pah commented Jul 11, 2014