-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RegularParser improvements #72
Conversation
… characters, added opcode optimized function aliases
…orage, profiler reports over 143x performance improvement for RegularParser, a well spent Saturday afternoon
I found and fixed #70: |
… small part of expected results, named captures based tokenizer was replaced with string matching which eats significant portion of previous performance gains, minor RegularParser and Processor improvements
6ef3344
to
2f52e54
Compare
Thanks for your efforts @thunderer, unfortunately, the loop error/out of memory error has returned with the latest commit. During this process, i noticed that xdebug was partially to blame. With xdebug on (how i run 99% of the time), I get this error, but with it off, I don't. I think a lot of other people run with xdebug on in their dev environments at least, hence the number of issues reported. But with the latest commit, the original issue remains. |
…alved, simplified lookahead() method body as type is always specified
… and restore it afterwards: there is no reliable way of detecting the current nesting level and safely manipulating its value in runtime to avoid process-breaking Error being thrown
…longer able to reproduce PCRE_JIT_STACKLIMIT_ERROR
@rhukster I think I've got both issues right this time, but this needs some explanation. Nesting You're right that nesting level errors are XDebug's "fault" as PHP itself does not have such limit, and that's why I decided to disable Memory Memory limit is a different story - it's possible to change it during runtime (just like the nesting case above), but I don't think it's the right solution. I don't control the size or complexity of target project's data, therefore I can only make my library use as little memory as possible, while being aware that users will always be able to construct the input large enough for it to fail. Your example from #71 uses ~47MiB so it is well under the 128MiB limit reported in getgrav/grav-plugin-shortcode-core#53 and gantry/gantry5#2426 . This issue should also be resolved now, but if you still experience memory limit issues please provide the exact input and the amount of memory it takes to process the data in your environment so we can compare the results and analyze where the differences come from. For large enough inputs the only solution is to increase the amount of available memory. |
Tested and can confirm the latest changes resolve the loop/memory issues I was seeing for large HTML and also with XDebug on. Looks good to me! |
Results are based on Blackfire runs of test data provided in #71. Current master finishes in 12.5s and uses 514MiBs of memory.
RegularParser::content()
body insideRegularParser::shortcode()
which halves the nesting level requirement causing problems in RegularParser = Uncaught Error: Maximum function nesting level of '256' reached, aborting! #71,\Error
being thrown in the middle of parsing process, therefore I'm disabling thexdebug.max_nesting_level
setting it to-1
and restoring its value afterwards. AFAIK PHP does not have the nesting level limit, it's strictly XDebug mechanism and the error is reproducible only when XDebug is enabled with its default256
nesting limit. Disabling XDebug or increasing the nesting limit (test data required 1024, 512 after inliningcontent()
).That's my kind of well spent Saturday afternoon (and Sunday).