Lazyload background images regexes are taking too long time to load and may timeout, consuming memory and CPU #6408

wordpressfan · 2024-01-24T09:41:08Z

Before submitting an issue please check that you’ve completed the following steps:

Made sure you’re on the latest version Yes
Used the search feature to ensure that the bug hasn’t been reported before Yes

Describe the bug
After doing some investigation on high CPU usage and profiling the code to reach which parts are consuming that much time, We found that the following regexes are taking too much time (over 1.5 seconds in some servers):

wp-rocket/inc/Engine/Media/Lazyload/CSS/Front/Extractor.php

Lines 41 to 43 in 29697ac

    
           $background_regex = '(?<selector>(?:[ \-,:\w.()\n\r^>[*"\'=\]#]|(?:\[[^\]]+\]))+)\s?{[^{}]*background\s*:(?<property>[^;}]*)[^}]*}'; 
        
           $background_image_regex = '(?<selector>(?:[ \-,:\w.()\n\r^>[*"\'=\]#]|(?:\[[^\]]+\]))+)\s?{[^{}]*background-image\s*:(?<property>[^;}]*)[^}]*}';

You can check an example for the first regex with long CSS styles:
https://regex101.com/r/Y2kVDS/1

To Reproduce
Steps to reproduce the behavior:

Add the CSS styles from here: https://regex101.com/r/Y2kVDS/1 into a page
Enable our lazyload bg images option.
Disable the preload
Visit the page in incognito mode.
You will find the page is taking too long time to load.

Expected behavior
We should load the page quickly.

Slack discussion: https://wp-media.slack.com/archives/C056ZJMHG0P/p1705319619398929

Also @Miraeld mentioned that we may need to merge both regexes in one but we need to enhance even one of them to work properly with large CSS styles.

alfonso100 · 2024-01-31T20:24:31Z

potentially related ticket with some debug data inside:

Ticket:
https://secure.helpscout.net/conversation/2489043067/470659/

Slack convo:
https://wp-media.slack.com/archives/C43T1AYMQ/p1706540720389929

alfonso100 · 2024-02-01T21:17:36Z

another case here. enabling LLforBGcss causes a 503 in some pages

ticket:
https://secure.helpscout.net/conversation/2491597524/471225?folderId=2683093

slack convo with collected data:
https://wp-media.slack.com/archives/C43T1AYMQ/p1706793488833259

worldwildweb · 2024-02-06T05:20:08Z

Related hs: https://secure.helpscout.net/conversation/2493221311/471532 and slack: https://wp-media.slack.com/archives/C43T1AYMQ/p1706883844720379

MathieuLamiot · 2024-02-06T11:28:31Z

I spent some time looking into those regexes, and discussed with @CrochetFeve0251 too. Here are the conclusions:

Issue
The root cause is likely to be the combination of nested repetition operators and the fact that this part of the regex is very unlikely to fail: (?<selector>(?:[ \-,:\w\.\n\r^>[*"\'=\]#]|(?:\[[^\]]+\]))+)\s?{. Therefore, the regex engine has a lot of matches up until the background and } substrings. When background is not found before a } the engine starts backtracking and there are a lot of possibilities for intermediary steps to retry because of the OR statement and the + operators. So the engine just gets lost into an exponential number of backtracking steps. (More on the topics here for instance.

How to reproduce

Activate the Lazylaod background CSS option on gamma.rocketlabsqa.ovh website.
Install/Activate Debug Bar and Debug Bar Slow Actions WP Rocket Edition .
Clear the cache.
Browse a few pages in incognito mode.
Come back as a logged-in user to access the Global panel of the Debug Bar Slow Actions. You will see the rocket_buffer and rocket_generate_lazyloaded_css actions taking several seconds on average.
It should not take more than a 500ms (very max.). We should aim for something around 200ms.

How to solve it
To solve catastrophic backtracking, one must:

Make sure the regex fails as soon as possible
Reduce as much as possible the number of branches explored by the engine

To do this, several ideas:

We should not start to match anything until we actually see a background (or background-image) somewhere. Otherwise, we are spending time on useless selectors. -> Approach suggested in this issue, down below.
We should try to optimize the selector part of the regex to remove nested repetition operators and OR statement as much as possible. -> Should be tackled with a dedicated GH issue after.

Proposed solution
For now, let's just work on re-arranging the regex without modifying the way the rules are written.

Instead of doing:

$background_regex = '(?<selector>(?:[ \-,:\w.()\n\r^>[*"\'=\]#]|(?:\[[^\]]+\]))+)\s?{[^{}]*background\s*:(?<property>[^;}]*)[^}]*}';
...
$background_matches       = $this->find( $background_regex, $content, 'mi' );

I suggest the following approach:

1 - Run a regex to retrieve the background tag and the related properties only:

$background_property_regex = 'background\s*:(?<property>[^;}]*)[^}]*}';
$background_property_matches       = $this->find( $background_regex, $content, 'mi' );

We should use the preg_match_all flag PREG_OFFSET_CAPTURE (see here) that will be needed later on. Be careful, it changes the structure for returned matches.

This step should allow to retrieve all background (and background-image) properties in the file.

If there are matches, continue the process, otherwise, bail out.

2 - Create a reversed version of the CSS string: we will need to regex backward so if the css string is abcdef, copy it in reverse to get $html_reversed = 'fedcba'. (Use strrev?)

3 - For each background_property_matches and background_image_property_matches, do something like the following:

$background_selector_regex = '(?<selector>(?:[ \-,:\w\.\n\r^>[*"\'=\]#]|(?:\[[^\]]+\]))+)\s?{[^{}]*background'
$background_selector_match       = preg_match('/' . $background_selector_regex . '/mi', $html_reversed, $matches, offset=$offset_from_the_property_match

Warning! this needs to be adapted cautiously on the following points:

the regex should be reversed, because we are working on the reversed file. So I guess something like 'dnuorgkcab[^{}]*{\s?(?<reversed_selector>(?:[ \-,:\w\.\n\r^>[*"\'=\]#]|(?:\][^\]]+\[))+)' ? (I did not test it, just writing what seems correct.)
$offset_from_the_property_match should be the offset of the property match adjusted because we read the file in reverse (so string_size - original_offset - word_size ?)

Here, we are using pref_match instead of pref_match_all to reduce as much as possible the regex duration: we just need the first match starting from the background tag: it will be the corresponding selector. No need to keep going after this.

For each background tag, we have the corresponding property and selector.

4 - There are cases when a single selector has several background tags and/or several background-image tags. In this case, we should only keep the last background tag and property and the last background-image tag and property. The current regex already does this. With this approach, we need to do it manually, based on the offset.

5 - Rebuild background_matches and background_image_matches to have the same format as the current version, so that the rest of the script can run.

Nice to have / To investigate
We should be able to merge the process for background and the process for background-image: In step 1, we could look for background or background-image. But it will complexify a bit the offset computation in step 3 and maybe the categorization of matches across the two types for step 4 & 5. So, to consider.

Estimated effort: S

How to validate:

NRT: There are many templates available (see with the QA team) that have been used to validate the feature when releasing 3.15. They should be re-used (maybe automated with Rocket-E2E?) to make sure we did not introduce any regression here.
Performance: rocket_buffer action duration is impacted by the Lazyload background CSS option currently. It can be measured with the Debug Bar Slow Action WPR Edition with current and new version on a website suffering from the issue (see how to reproduce in the original GH issue)

Tabrisrp · 2024-02-12T20:36:38Z

Looks good to me, smart to do it in reverse!

Khadreal · 2024-02-15T14:04:18Z

I'm thinking of using something like this /(?P<selector>[^{}]+)\s*{\s*(?>[^{}]+{[^{}]*})*\bbackground\s*:\s*(?P<property>[^;}]+)/ which would be faster compared to the current one for the background regex.
It doesn't rely on the OR operator like the first which increase the complexity and slow down matching. It has a minimal look arounds.

The first approach as suggested by @MathieuLamiot is faster but we need the selector not only the background and the properties

MathieuLamiot · 2024-02-15T14:10:42Z

To be honest, I am unable to tell what is the functional difference between this regex and the original one without spending 2 hours on regex101 trying to compare both... 😬 My regex fluency is far from ideal I guess :/ Could you explain what are the functional differences and why what has been removed can be removed safely? Thanks! As it seems several tests are failing, so there might be some regressions 🤔

The first approach as suggested by @MathieuLamiot is faster but we need the selector not only the background and the properties

The approach I suggested does provide both the selector and the properties, so that should not be an issue I think?

Thanks for looking into this 🙏

Khadreal · 2024-02-15T14:29:47Z

The approach gives the property but no selector, see example below. I'll modify this and try again to see the performance
{ [0]=> array(3) { [0]=> string(29) "background: var(--color21); }" ["property"]=> string(15) " var(--color21)" [1]=> string(15) " var(--color21)" }

Regex diff: the old regex handles more elaborate character class to match various characters that might appear in a stylesheet selector like whitespace, special characters, etc this make it complex and takes time during execution. I'm still testing the new one to make sure that it performs well and faster.
I'll look into the test why it's failing.

MathieuLamiot · 2024-02-15T14:48:40Z

The approach gives the property but no selector, see example below.

I think there is a misunderstanding here: the groomed approach is not just:

1 - Run a regex to retrieve the background tag and the related properties only:

But it actually needs the 6 steps to be implemented to fully work!
Step 1 will indeed only give you properties. Then the following steps should allow to get the related selector.

MathieuLamiot · 2024-02-26T09:29:29Z

I just tested the performances on gamma website:

develop: ~4.4s for rocket_generate_lazyloaded_css on average
this branch: 90ms

From a performance point of view, this is working great. We still have to do CR and NRT.

hanna-meda · 2024-03-05T10:16:22Z

@Khadreal, while testing this PR, I see there are lots of duplicates of this warning in the debug.log
on rocketlabs (PHP version 7.4.33)
[05-Mar-2024 09:53:52 UTC] PHP Notice: Undefined offset: 0 in /var/www/rocketlabsqa.ovh/htdocs/wp-content/plugins/wp-rocket/inc/Engine/Media/Lazyload/CSS/Front/Extractor.php on line 115
on new.rocketlabs (PHP version 8.3.2)
[04-Mar-2024 06:23:02 UTC] PHP Warning: Undefined array key 0 in /var/www/new.rocketlabsqa.ovh/htdocs/wp-content/plugins/wp-rocket/inc/Engine/Media/Lazyload/CSS/Front/Extractor.php on line 112 [04-Mar-2024 06:23:02 UTC] PHP Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /var/www/new.rocketlabsqa.ovh/htdocs/wp-content/plugins/wp-rocket/inc/Engine/Media/Lazyload/CSS/Front/Extractor.php on line 112

Returning this to In Progress, if you could have a look. Thank you.

Khadreal · 2024-03-06T07:45:23Z

@hanna-meda, I pushed a fix to make sure those error don't pop up in the log, could you try again and let me know if it's okay

hanna-meda · 2024-03-06T10:13:38Z

@Khadreal, seems these kinds of warnings are still coming in in great numbers in the log. If you can have another look.

debug.log

Khadreal · 2024-03-07T12:17:17Z

@hanna-meda I noticed the branch wasn't updated after my last fix, I've updated the branch now. Let me know how it goes with it again.
Thanks

hanna-meda · 2024-03-12T10:44:45Z

@Khadreal, I noticed a regression here, regex isn't handling bg-images that contain CSS pseudo-classes such as (even) or (3n), like in these examples:

li:nth-child(even) { background: url(/wp-content/rocket-test-data/images/underline.png) no-repeat;
.external dd:nth-last-of-type(3n) { background: url(/wp-content/rocket-test-data/images/img_nature.jpg) no-repeat;
li:nth-child(even) { background: url(/wp-content/rocket-test-data/images/fetchpriority.jpg) no-repeat;
dd:nth-last-of-type(3n) { background: url(/wp-content/rocket-test-data/images/flowers.jpg) no-repeat;

so this type of background images will not be lazyloaded (T41553 as reference)

@piotrbak, I know that this being regression we usually return the ticket to In Progress, but as this ticket is long overdue its time on the board, would you think we can move forward with it as is, having a separate GH for the above issue? Or we move it back to In Progress?

cc @MathieuLamiot

piotrbak · 2024-03-12T11:00:46Z

@hanna-meda We can't introduce regression here, basically, it's better to release once then come back to the issue in the future and repeat all the tests again.

My suggestion would be to share with @Khadreal all possible markups that he'd be able to stress and test using https://regex101.com

MathieuLamiot · 2024-03-12T11:33:38Z

Moving it to blocked. Out of scope for the upcoming release.

MathieuLamiot · 2024-03-12T12:30:20Z

Discussed with @Khadreal, might be a quick fix. To be able to handle whatever decision we'll have to put it in the release or not, @hanna-meda, @Khadreal can you please run as many test plans as possible on this issue today, and provide a clear list of what is OK and what is KO.
This will allows:

To decide properly if hte behavior is acceptable or not
In case we postpone, to know precisely what needs to be fixed later on.

Thanks 🙏

piotrbak added type: enhancement Improvements that slightly enhance existing functionality and are fast to implement priority: high Issues which should be resolved as quickly as possible needs: grooming module: lazyload background css images labels Feb 1, 2024

MathieuLamiot added the effort: [S] 1-2 days of estimated development time label Feb 6, 2024

MathieuLamiot mentioned this issue Feb 6, 2024

Optimize Lazyload Background CSS regex finding selectors #6433

Open

piotrbak added this to the 3.15.10 milestone Feb 8, 2024

Khadreal self-assigned this Feb 9, 2024

Tabrisrp removed the needs: grooming label Feb 12, 2024

Khadreal added a commit that referenced this issue Feb 15, 2024

:chore: Change regex for background #6408

4204f8c

Khadreal added a commit that referenced this issue Feb 15, 2024

Modify regex #6408

2dd4775

Khadreal mentioned this issue Feb 15, 2024

Background cpu#6408 #6448

Merged

19 tasks

hanna-meda self-assigned this Feb 28, 2024

MathieuLamiot removed this from the 3.15.10 milestone Mar 12, 2024

MathieuLamiot added this to the 3.15.10 milestone Mar 12, 2024

hanna-meda closed this as completed in #6448 Mar 13, 2024

wordpressfan mentioned this issue Mar 14, 2024

Release v3.15.10 #6482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazyload background images regexes are taking too long time to load and may timeout, consuming memory and CPU #6408

Lazyload background images regexes are taking too long time to load and may timeout, consuming memory and CPU #6408

wordpressfan commented Jan 24, 2024

alfonso100 commented Jan 31, 2024

alfonso100 commented Feb 1, 2024

worldwildweb commented Feb 6, 2024

MathieuLamiot commented Feb 6, 2024 •

edited

Tabrisrp commented Feb 12, 2024

Khadreal commented Feb 15, 2024

MathieuLamiot commented Feb 15, 2024 •

edited

Khadreal commented Feb 15, 2024

MathieuLamiot commented Feb 15, 2024 •

edited

MathieuLamiot commented Feb 26, 2024

hanna-meda commented Mar 5, 2024

Khadreal commented Mar 6, 2024

hanna-meda commented Mar 6, 2024

Khadreal commented Mar 7, 2024

hanna-meda commented Mar 12, 2024

piotrbak commented Mar 12, 2024

MathieuLamiot commented Mar 12, 2024

MathieuLamiot commented Mar 12, 2024

Lazyload background images regexes are taking too long time to load and may timeout, consuming memory and CPU #6408

Lazyload background images regexes are taking too long time to load and may timeout, consuming memory and CPU #6408

Comments

wordpressfan commented Jan 24, 2024

alfonso100 commented Jan 31, 2024

alfonso100 commented Feb 1, 2024

worldwildweb commented Feb 6, 2024

MathieuLamiot commented Feb 6, 2024 • edited

Tabrisrp commented Feb 12, 2024

Khadreal commented Feb 15, 2024

MathieuLamiot commented Feb 15, 2024 • edited

Khadreal commented Feb 15, 2024

MathieuLamiot commented Feb 15, 2024 • edited

MathieuLamiot commented Feb 26, 2024

hanna-meda commented Mar 5, 2024

Khadreal commented Mar 6, 2024

hanna-meda commented Mar 6, 2024

Khadreal commented Mar 7, 2024

hanna-meda commented Mar 12, 2024

piotrbak commented Mar 12, 2024

MathieuLamiot commented Mar 12, 2024

MathieuLamiot commented Mar 12, 2024

MathieuLamiot commented Feb 6, 2024 •

edited

MathieuLamiot commented Feb 15, 2024 •

edited

MathieuLamiot commented Feb 15, 2024 •

edited