Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research on using custom stack instead of call-stack in Reader #35

Closed
3 tasks
miloyip opened this issue Jun 30, 2014 · 3 comments
Closed
3 tasks

Research on using custom stack instead of call-stack in Reader #35

miloyip opened this issue Jun 30, 2014 · 3 comments

Comments

@miloyip
Copy link
Collaborator

miloyip commented Jun 30, 2014

Currently, ParseObject(), ParseArray() and ParseValue() will recursively call other parse functions. This has potential stack overflow problem if a JSON tree is very deep (maybe a synthesized JSON for security attack).

  • Research changing these recursive call using custom stack.
  • Evaluate the performance impact
  • May add a configurable limit of tree depth.
@thebusytypist
Copy link
Contributor

Currently ParseErrorCode set is not complete.

By a explicit state machine implementation(branch TransitionTable), all errors can be captured on every transition to the error state.

@thebusytypist
Copy link
Contributor

Progress report:

Most of the functions have been implemented. Except that "May add a configurable limit of tree depth". I am considering adding a configurable limit on parsing stack size instead.

The state transitions are through unittested.

I also have a brief performance test for current implementation(36434b6).

On a i5 2400, 4GB Windows 7 x64 machine, with release32 build:

RapidJson.ReaderParse_DummyHandler_SSE42 (590 ms)
RapidJson.ReaderParseInsitu_DummyHandler_SSE42 (543 ms)
RapidJson.ReaderParseIterative_DummyHandler_SSE42 (656 ms)
RapidJson.ReaderParseIterativeInsitu_DummyHandler_SSE42 (503 ms)

I do not have a good explanation for these results. I guess change of memory access pattern may be the major cause.

A significant change is that the iterative parser will push state to the internal stack(GenericReader::stack_), which was only used for string parsing(GenericReader::ParseString) before.

I tried to separate the state stack, but did not see any impact on performance.

@pah
Copy link
Contributor

pah commented Jul 11, 2014

On an Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, Linux 64-bit, GCC 4.8, I see the following:

  • [ OK ] RapidJson.ReaderParseInsitu_DummyHandler_SSE42 (550 ms)
    [ OK ] RapidJson.ReaderParseIterativeInsitu_DummyHandler_SSE42 (410 ms)
  • [ OK ] RapidJson.ReaderParse_DummyHandler_SSE42 (803 ms)
    [ OK ] RapidJson.ReaderParseIterative_DummyHandler_SSE42 (744 ms)

So it seems to be quite compiler-specific.

(The ordering of the tests could be more convenient to compare the variants, though).

miloyip added a commit that referenced this issue Jul 18, 2014
@miloyip miloyip closed this as completed Jul 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants