Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On large HTML files, RegularParser does not fire any handler. #70

Closed
devnix opened this issue Sep 24, 2018 · 12 comments
Closed

On large HTML files, RegularParser does not fire any handler. #70

devnix opened this issue Sep 24, 2018 · 12 comments
Assignees
Labels

Comments

@devnix
Copy link

devnix commented Sep 24, 2018

I've been implementing this library on a project in my company, which scraps a whole local Wordpress with httrack and then modifies the files in order to fit to our needs.

To make some parts dynamic, we're implementing shortcodes (which we translate to custom PHP code). I've been testing your library without any problems, but when I gave it a definitive file, it didn't triggered any handler, and the parser pushed PHP memory consumption over 800MB.

After several tests, I've decided to try the RegexParser. It parsed the files correctly, and using a riddiculous amount of time and memory compared with the standard one.

I can understand the extra time and memory consumption (uncer certain limits), but I don't get why the parser didn't saw any of my tags, I am the only one who suffered this issue?

BTW, thank you very much for your work, your library is awesome and I'm enjoying it a lot!

@thunderer
Copy link
Owner

Hi @devnix, thanks for your kind words! I'm happy that you like this library.

Can you send me a script to reproduce the issue along with the problematic file? Also, can you verify that you're using the latest version of Shortcode (v0.6.5)?

@thunderer thunderer self-assigned this Sep 24, 2018
@thunderer
Copy link
Owner

@devnix did you see my previous comment? Can you provide the information I need?

@devnix
Copy link
Author

devnix commented Sep 27, 2018

Yes, excuse me!

I have to get some time to prepare you a couple of files to reproduce the issue.

@thunderer
Copy link
Owner

No problem, I just wanted to know what is the current status. Send them my way when you're ready.

@thunderer
Copy link
Owner

Hi @devnix, did you have the chance to prepare the test data? It's fine if you need more time, just let me know if I can help you in any way. Meanwhile, you may be interested in #72, I found over 100x performance gain in RegularParser so there should be no penalty of running it vs other parsers.

@thunderer thunderer mentioned this issue Dec 8, 2018
6 tasks
thunderer added a commit that referenced this issue Dec 8, 2018
… small part of expected results, named captures based tokenizer was replaced with string matching which eats significant portion of previous performance gains, minor RegularParser and Processor improvements
@thunderer
Copy link
Owner

@devnix good news, during verification phase I managed to reproduce your issue and I'm happy to tell you that it's fixed! Once #72 will be checked by others and merged, I'm releasing a new version right away. Unfortunately the fix ate some performance gains I mentioned above, but the result is still very good - above 10x gains in both time and memory.

This also means that you don't need to prepare anything for me, but I'd be very grateful if you could report back and confirm what I've found. Thanks!

thunderer added a commit that referenced this issue Dec 9, 2018
… small part of expected results, named captures based tokenizer was replaced with string matching which eats significant portion of previous performance gains, minor RegularParser and Processor improvements
@devnix
Copy link
Author

devnix commented Dec 10, 2018

Wow, that's great! I'm afraid I'm very busy lately and I've tried to do more things that I've been able to, so it was difficult for me to prepare the test data removing all the sensible info to replicate the issue...

I think I can confirm you in the following days if the patch works. Should I use the latest version of master?

@thunderer
Copy link
Owner

@devnix Yes, please use the latest master. I'm currently preparing the next release so it would be great to have your confirmation before tagging it.

@devnix
Copy link
Author

devnix commented Dec 17, 2018

It seems like it worked well! I think I can close the issue and thank you a lot for your work, besides I could not give you all the help I would liked to.

@devnix devnix closed this as completed Dec 17, 2018
@thunderer
Copy link
Owner

@devnix v0.7.0 was just tagged, thanks for your cooperation!

@rhukster
Copy link

rhukster commented Apr 3, 2019

FYI, I still think there is an issue for large documents. I'm working on moving some Grav documentation to use a new prism.js shortcode, and it does not render the shortcodes at all with the Regular parser. Regex parser works in this case though:

https://raw.githubusercontent.com/getgrav/grav-learn/feature/learn4/pages/06.forms/01.blueprints/01.fields-available/docs.md

@thunderer
Copy link
Owner

@rhukster can you open a new issue for that? Please include an example code to reproduce the problem and describe the expected result, ie. which handlers are not fired. I'll be happy to look at it in the next few days. Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants