New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazyload background images regexes are taking too long time to load and may timeout, consuming memory and CPU #6408
Comments
potentially related ticket with some debug data inside: Ticket: Slack convo: |
another case here. enabling LLforBGcss causes a 503 in some pages ticket: slack convo with collected data: |
I spent some time looking into those regexes, and discussed with @CrochetFeve0251 too. Here are the conclusions: Issue How to reproduce
How to solve it
To do this, several ideas:
Proposed solution Instead of doing:
I suggest the following approach: 1 - Run a regex to retrieve the background tag and the related properties only:
We should use the preg_match_all flag This step should allow to retrieve all background (and background-image) properties in the file. If there are matches, continue the process, otherwise, bail out. 2 - Create a reversed version of the CSS string: we will need to regex backward so if the css string is 3 - For each
Warning! this needs to be adapted cautiously on the following points:
Here, we are using pref_match instead of pref_match_all to reduce as much as possible the regex duration: we just need the first match starting from the background tag: it will be the corresponding selector. No need to keep going after this. For each background tag, we have the corresponding property and selector. 4 - There are cases when a single selector has several background tags and/or several background-image tags. In this case, we should only keep the last background tag and property and the last background-image tag and property. The current regex already does this. With this approach, we need to do it manually, based on the offset. 5 - Rebuild background_matches and background_image_matches to have the same format as the current version, so that the rest of the script can run. Nice to have / To investigate Estimated effort: S How to validate:
|
Looks good to me, smart to do it in reverse! |
I'm thinking of using something like this The first approach as suggested by @MathieuLamiot is faster but we need the selector not only the background and the properties |
To be honest, I am unable to tell what is the functional difference between this regex and the original one without spending 2 hours on regex101 trying to compare both... 😬 My regex fluency is far from ideal I guess :/ Could you explain what are the functional differences and why what has been removed can be removed safely? Thanks! As it seems several tests are failing, so there might be some regressions 🤔
The approach I suggested does provide both the selector and the properties, so that should not be an issue I think? Thanks for looking into this 🙏 |
The approach gives the property but no selector, see example below. I'll modify this and try again to see the performance Regex diff: the old regex handles more elaborate character class to match various characters that might appear in a stylesheet selector like whitespace, special characters, etc this make it complex and takes time during execution. I'm still testing the new one to make sure that it performs well and faster. |
I think there is a misunderstanding here: the groomed approach is not just:
But it actually needs the 6 steps to be implemented to fully work! |
I just tested the performances on gamma website:
From a performance point of view, this is working great. We still have to do CR and NRT. |
@Khadreal, while testing this PR, I see there are lots of duplicates of this warning in the debug.log Returning this to In Progress, if you could have a look. Thank you. |
@hanna-meda, I pushed a fix to make sure those error don't pop up in the log, could you try again and let me know if it's okay |
@hanna-meda I noticed the branch wasn't updated after my last fix, I've updated the branch now. Let me know how it goes with it again. |
@Khadreal, I noticed a regression here, regex isn't handling bg-images that contain CSS pseudo-classes such as (even) or (3n), like in these examples:
so this type of background images will not be lazyloaded (T41553 as reference) @piotrbak, I know that this being regression we usually return the ticket to In Progress, but as this ticket is long overdue its time on the board, would you think we can move forward with it as is, having a separate GH for the above issue? Or we move it back to In Progress? |
@hanna-meda We can't introduce regression here, basically, it's better to release once then come back to the issue in the future and repeat all the tests again. My suggestion would be to share with @Khadreal all possible markups that he'd be able to stress and test using https://regex101.com |
Moving it to blocked. Out of scope for the upcoming release. |
Discussed with @Khadreal, might be a quick fix. To be able to handle whatever decision we'll have to put it in the release or not, @hanna-meda, @Khadreal can you please run as many test plans as possible on this issue today, and provide a clear list of what is OK and what is KO.
Thanks 🙏 |
Before submitting an issue please check that you’ve completed the following steps:
Describe the bug
After doing some investigation on high CPU usage and profiling the code to reach which parts are consuming that much time, We found that the following regexes are taking too much time (over 1.5 seconds in some servers):
wp-rocket/inc/Engine/Media/Lazyload/CSS/Front/Extractor.php
Lines 41 to 43 in 29697ac
You can check an example for the first regex with long CSS styles:
https://regex101.com/r/Y2kVDS/1
To Reproduce
Steps to reproduce the behavior:
Expected behavior
We should load the page quickly.
Slack discussion: https://wp-media.slack.com/archives/C056ZJMHG0P/p1705319619398929
Also @Miraeld mentioned that we may need to merge both regexes in one but we need to enhance even one of them to work properly with large CSS styles.
The text was updated successfully, but these errors were encountered: