Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazyload background images regexes are taking too long time to load and may timeout, consuming memory and CPU #6408

Closed
wordpressfan opened this issue Jan 24, 2024 · 18 comments · Fixed by #6448
Assignees
Labels
effort: [S] 1-2 days of estimated development time module: lazyload background css images priority: high Issues which should be resolved as quickly as possible type: enhancement Improvements that slightly enhance existing functionality and are fast to implement
Milestone

Comments

@wordpressfan
Copy link
Contributor

Before submitting an issue please check that you’ve completed the following steps:

  • Made sure you’re on the latest version Yes
  • Used the search feature to ensure that the bug hasn’t been reported before Yes

Describe the bug
After doing some investigation on high CPU usage and profiling the code to reach which parts are consuming that much time, We found that the following regexes are taking too much time (over 1.5 seconds in some servers):

$background_regex = '(?<selector>(?:[ \-,:\w.()\n\r^>[*"\'=\]#]|(?:\[[^\]]+\]))+)\s?{[^{}]*background\s*:(?<property>[^;}]*)[^}]*}';
$background_image_regex = '(?<selector>(?:[ \-,:\w.()\n\r^>[*"\'=\]#]|(?:\[[^\]]+\]))+)\s?{[^{}]*background-image\s*:(?<property>[^;}]*)[^}]*}';

You can check an example for the first regex with long CSS styles:
https://regex101.com/r/Y2kVDS/1

To Reproduce
Steps to reproduce the behavior:

  1. Add the CSS styles from here: https://regex101.com/r/Y2kVDS/1 into a page
  2. Enable our lazyload bg images option.
  3. Disable the preload
  4. Visit the page in incognito mode.
  5. You will find the page is taking too long time to load.

Expected behavior
We should load the page quickly.

Slack discussion: https://wp-media.slack.com/archives/C056ZJMHG0P/p1705319619398929

Also @Miraeld mentioned that we may need to merge both regexes in one but we need to enhance even one of them to work properly with large CSS styles.

@alfonso100
Copy link
Contributor

potentially related ticket with some debug data inside:

Ticket:
https://secure.helpscout.net/conversation/2489043067/470659/

Slack convo:
https://wp-media.slack.com/archives/C43T1AYMQ/p1706540720389929

@piotrbak piotrbak added type: enhancement Improvements that slightly enhance existing functionality and are fast to implement priority: high Issues which should be resolved as quickly as possible needs: grooming module: lazyload background css images labels Feb 1, 2024
@alfonso100
Copy link
Contributor

another case here. enabling LLforBGcss causes a 503 in some pages

ticket:
https://secure.helpscout.net/conversation/2491597524/471225?folderId=2683093

slack convo with collected data:
https://wp-media.slack.com/archives/C43T1AYMQ/p1706793488833259

@worldwildweb
Copy link
Contributor

@MathieuLamiot
Copy link
Contributor

MathieuLamiot commented Feb 6, 2024

I spent some time looking into those regexes, and discussed with @CrochetFeve0251 too. Here are the conclusions:

Issue
The root cause is likely to be the combination of nested repetition operators and the fact that this part of the regex is very unlikely to fail: (?<selector>(?:[ \-,:\w\.\n\r^>[*"\'=\]#]|(?:\[[^\]]+\]))+)\s?{. Therefore, the regex engine has a lot of matches up until the background and } substrings. When background is not found before a } the engine starts backtracking and there are a lot of possibilities for intermediary steps to retry because of the OR statement and the + operators. So the engine just gets lost into an exponential number of backtracking steps. (More on the topics here for instance.

How to reproduce

  • Activate the Lazylaod background CSS option on gamma.rocketlabsqa.ovh website.
  • Install/Activate Debug Bar and Debug Bar Slow Actions WP Rocket Edition .
  • Clear the cache.
  • Browse a few pages in incognito mode.
  • Come back as a logged-in user to access the Global panel of the Debug Bar Slow Actions. You will see the rocket_buffer and rocket_generate_lazyloaded_css actions taking several seconds on average.
    It should not take more than a 500ms (very max.). We should aim for something around 200ms.

How to solve it
To solve catastrophic backtracking, one must:

  • Make sure the regex fails as soon as possible
  • Reduce as much as possible the number of branches explored by the engine

To do this, several ideas:

  • We should not start to match anything until we actually see a background (or background-image) somewhere. Otherwise, we are spending time on useless selectors. -> Approach suggested in this issue, down below.
  • We should try to optimize the selector part of the regex to remove nested repetition operators and OR statement as much as possible. -> Should be tackled with a dedicated GH issue after.

Proposed solution
For now, let's just work on re-arranging the regex without modifying the way the rules are written.

Instead of doing:

$background_regex = '(?<selector>(?:[ \-,:\w.()\n\r^>[*"\'=\]#]|(?:\[[^\]]+\]))+)\s?{[^{}]*background\s*:(?<property>[^;}]*)[^}]*}';
...
$background_matches       = $this->find( $background_regex, $content, 'mi' );

I suggest the following approach:

1 - Run a regex to retrieve the background tag and the related properties only:

$background_property_regex = 'background\s*:(?<property>[^;}]*)[^}]*}';
$background_property_matches       = $this->find( $background_regex, $content, 'mi' );

We should use the preg_match_all flag PREG_OFFSET_CAPTURE (see here) that will be needed later on. Be careful, it changes the structure for returned matches.

This step should allow to retrieve all background (and background-image) properties in the file.

If there are matches, continue the process, otherwise, bail out.

2 - Create a reversed version of the CSS string: we will need to regex backward so if the css string is abcdef, copy it in reverse to get $html_reversed = 'fedcba'. (Use strrev?)

3 - For each background_property_matches and background_image_property_matches, do something like the following:

$background_selector_regex = '(?<selector>(?:[ \-,:\w\.\n\r^>[*"\'=\]#]|(?:\[[^\]]+\]))+)\s?{[^{}]*background'
$background_selector_match       = preg_match('/' . $background_selector_regex . '/mi', $html_reversed, $matches, offset=$offset_from_the_property_match

Warning! this needs to be adapted cautiously on the following points:

  • the regex should be reversed, because we are working on the reversed file. So I guess something like 'dnuorgkcab[^{}]*{\s?(?<reversed_selector>(?:[ \-,:\w\.\n\r^>[*"\'=\]#]|(?:\][^\]]+\[))+)' ? (I did not test it, just writing what seems correct.)
  • $offset_from_the_property_match should be the offset of the property match adjusted because we read the file in reverse (so string_size - original_offset - word_size ?)

Here, we are using pref_match instead of pref_match_all to reduce as much as possible the regex duration: we just need the first match starting from the background tag: it will be the corresponding selector. No need to keep going after this.

For each background tag, we have the corresponding property and selector.

4 - There are cases when a single selector has several background tags and/or several background-image tags. In this case, we should only keep the last background tag and property and the last background-image tag and property. The current regex already does this. With this approach, we need to do it manually, based on the offset.

5 - Rebuild background_matches and background_image_matches to have the same format as the current version, so that the rest of the script can run.

Nice to have / To investigate
We should be able to merge the process for background and the process for background-image: In step 1, we could look for background or background-image. But it will complexify a bit the offset computation in step 3 and maybe the categorization of matches across the two types for step 4 & 5. So, to consider.

Estimated effort: S

How to validate:

  • NRT: There are many templates available (see with the QA team) that have been used to validate the feature when releasing 3.15. They should be re-used (maybe automated with Rocket-E2E?) to make sure we did not introduce any regression here.
  • Performance: rocket_buffer action duration is impacted by the Lazyload background CSS option currently. It can be measured with the Debug Bar Slow Action WPR Edition with current and new version on a website suffering from the issue (see how to reproduce in the original GH issue)

@MathieuLamiot MathieuLamiot added the effort: [S] 1-2 days of estimated development time label Feb 6, 2024
@piotrbak piotrbak added this to the 3.15.10 milestone Feb 8, 2024
@Khadreal Khadreal self-assigned this Feb 9, 2024
@Tabrisrp
Copy link
Contributor

Looks good to me, smart to do it in reverse!

Khadreal added a commit that referenced this issue Feb 15, 2024
@Khadreal Khadreal mentioned this issue Feb 15, 2024
19 tasks
@Khadreal
Copy link
Contributor

I'm thinking of using something like this /(?P<selector>[^{}]+)\s*{\s*(?>[^{}]+{[^{}]*})*\bbackground\s*:\s*(?P<property>[^;}]+)/ which would be faster compared to the current one for the background regex.
It doesn't rely on the OR operator like the first which increase the complexity and slow down matching. It has a minimal look arounds.

The first approach as suggested by @MathieuLamiot is faster but we need the selector not only the background and the properties

@MathieuLamiot
Copy link
Contributor

MathieuLamiot commented Feb 15, 2024

To be honest, I am unable to tell what is the functional difference between this regex and the original one without spending 2 hours on regex101 trying to compare both... 😬 My regex fluency is far from ideal I guess :/ Could you explain what are the functional differences and why what has been removed can be removed safely? Thanks! As it seems several tests are failing, so there might be some regressions 🤔

The first approach as suggested by @MathieuLamiot is faster but we need the selector not only the background and the properties

The approach I suggested does provide both the selector and the properties, so that should not be an issue I think?

Thanks for looking into this 🙏

@Khadreal
Copy link
Contributor

The approach gives the property but no selector, see example below. I'll modify this and try again to see the performance
{ [0]=> array(3) { [0]=> string(29) "background: var(--color21); }" ["property"]=> string(15) " var(--color21)" [1]=> string(15) " var(--color21)" }

Regex diff: the old regex handles more elaborate character class to match various characters that might appear in a stylesheet selector like whitespace, special characters, etc this make it complex and takes time during execution. I'm still testing the new one to make sure that it performs well and faster.
I'll look into the test why it's failing.

@MathieuLamiot
Copy link
Contributor

MathieuLamiot commented Feb 15, 2024

The approach gives the property but no selector, see example below.

I think there is a misunderstanding here: the groomed approach is not just:

1 - Run a regex to retrieve the background tag and the related properties only:

But it actually needs the 6 steps to be implemented to fully work!
Step 1 will indeed only give you properties. Then the following steps should allow to get the related selector.

@MathieuLamiot
Copy link
Contributor

I just tested the performances on gamma website:

  • develop: ~4.4s for rocket_generate_lazyloaded_css on average
  • this branch: 90ms

From a performance point of view, this is working great. We still have to do CR and NRT.

@hanna-meda hanna-meda self-assigned this Feb 28, 2024
@hanna-meda
Copy link

@Khadreal, while testing this PR, I see there are lots of duplicates of this warning in the debug.log
on rocketlabs (PHP version 7.4.33)
[05-Mar-2024 09:53:52 UTC] PHP Notice: Undefined offset: 0 in /var/www/rocketlabsqa.ovh/htdocs/wp-content/plugins/wp-rocket/inc/Engine/Media/Lazyload/CSS/Front/Extractor.php on line 115
on new.rocketlabs (PHP version 8.3.2)
[04-Mar-2024 06:23:02 UTC] PHP Warning: Undefined array key 0 in /var/www/new.rocketlabsqa.ovh/htdocs/wp-content/plugins/wp-rocket/inc/Engine/Media/Lazyload/CSS/Front/Extractor.php on line 112 [04-Mar-2024 06:23:02 UTC] PHP Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /var/www/new.rocketlabsqa.ovh/htdocs/wp-content/plugins/wp-rocket/inc/Engine/Media/Lazyload/CSS/Front/Extractor.php on line 112

Returning this to In Progress, if you could have a look. Thank you.

@Khadreal
Copy link
Contributor

Khadreal commented Mar 6, 2024

@hanna-meda, I pushed a fix to make sure those error don't pop up in the log, could you try again and let me know if it's okay

@hanna-meda
Copy link

@Khadreal, seems these kinds of warnings are still coming in in great numbers in the log. If you can have another look.

debug.log

@Khadreal
Copy link
Contributor

Khadreal commented Mar 7, 2024

@hanna-meda I noticed the branch wasn't updated after my last fix, I've updated the branch now. Let me know how it goes with it again.
Thanks

@hanna-meda
Copy link

@Khadreal, I noticed a regression here, regex isn't handling bg-images that contain CSS pseudo-classes such as (even) or (3n), like in these examples:

li:nth-child(even) { background: url(/wp-content/rocket-test-data/images/underline.png) no-repeat;
.external dd:nth-last-of-type(3n) { background: url(/wp-content/rocket-test-data/images/img_nature.jpg) no-repeat;
li:nth-child(even) {   background: url(/wp-content/rocket-test-data/images/fetchpriority.jpg) no-repeat;
dd:nth-last-of-type(3n) {   background: url(/wp-content/rocket-test-data/images/flowers.jpg) no-repeat;

so this type of background images will not be lazyloaded (T41553 as reference)

@piotrbak, I know that this being regression we usually return the ticket to In Progress, but as this ticket is long overdue its time on the board, would you think we can move forward with it as is, having a separate GH for the above issue? Or we move it back to In Progress?

cc @MathieuLamiot

@piotrbak
Copy link
Contributor

@hanna-meda We can't introduce regression here, basically, it's better to release once then come back to the issue in the future and repeat all the tests again.

My suggestion would be to share with @Khadreal all possible markups that he'd be able to stress and test using https://regex101.com

@MathieuLamiot MathieuLamiot removed this from the 3.15.10 milestone Mar 12, 2024
@MathieuLamiot
Copy link
Contributor

Moving it to blocked. Out of scope for the upcoming release.

@MathieuLamiot
Copy link
Contributor

Discussed with @Khadreal, might be a quick fix. To be able to handle whatever decision we'll have to put it in the release or not, @hanna-meda, @Khadreal can you please run as many test plans as possible on this issue today, and provide a clear list of what is OK and what is KO.
This will allows:

  • To decide properly if hte behavior is acceptable or not
  • In case we postpone, to know precisely what needs to be fixed later on.

Thanks 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: [S] 1-2 days of estimated development time module: lazyload background css images priority: high Issues which should be resolved as quickly as possible type: enhancement Improvements that slightly enhance existing functionality and are fast to implement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants