Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUCSS - the wpr_rucss_used_css table can grow big on large websites #4802

Closed
4 tasks
vmanthos opened this issue Mar 10, 2022 · 50 comments · Fixed by #5043
Closed
4 tasks

RUCSS - the wpr_rucss_used_css table can grow big on large websites #4802

vmanthos opened this issue Mar 10, 2022 · 50 comments · Fixed by #5043
Assignees
Labels
effort: [M] 3-5 days of estimated development time module: remove unused css priority: high Issues which should be resolved as quickly as possible type: enhancement Improvements that slightly enhance existing functionality and are fast to implement
Milestone

Comments

@vmanthos
Copy link
Contributor

Before submitting an issue please check that you’ve completed the following steps:

  • Made sure you’re on the latest version ✅
  • Used the search feature to ensure that the bug hasn’t been reported before ✅

Describe the bug

The wpr_rucss_used_css table's size depends on the size of the used CSS for each page, and also the number of pages on a site.

On large websites that have a lot of URLs, it can grow out of proportion. In one case, that table's size reached 18.4Gb.

After internal discussion, I'm creating this GitHub issue to monitor these cases, and see if there is anything we can do in the future.

To Reproduce
Not relevant.

Additional context

This is different from #4161 which will be resolved with the new implementation of the feature.

Tickets:
https://secure.helpscout.net/conversation/1810416819/330445/
https://secure.helpscout.net/conversation/1657346732/300522
https://secure.helpscout.net/conversation/1670431889/303126
https://secure.helpscout.net/conversation/1665033380/301876
https://secure.helpscout.net/conversation/1645065559/297803

Backlog Grooming (for WP Media dev team use only)

  • Reproduce the problem
  • Identify the root cause
  • Scope a solution
  • Estimate the effort
@piotrbak piotrbak added module: remove unused css type: enhancement Improvements that slightly enhance existing functionality and are fast to implement labels Mar 10, 2022
@piotrbak
Copy link
Contributor

@vmanthos It would be good to know how many URLs are there. For 18.4GB it would be around 480 000 URLs when the Separate Cache for Mobiles is disabled or 240 000 when it's enabled. Is this accurate more or less?

@vmanthos
Copy link
Contributor Author

@piotrbak One sub-sitemap of the specific site had 6 other sub-sitemaps containing more than 17.000 URLs each. We are talking about hundreds of thousands of URLs in this case.

But the size of the table can be an issue even if it doesn't reach that size. Here is the comment from a customer:

Many shared servers limit the database to 1GB.
In this case it happens in this way and when exceeding it, write crashes occur or sometimes the database crashes completely generating a lot of inconveniences.

Ticket: https://secure.helpscout.net/conversation/1814909181/331377/

@nb4cks
Copy link

nb4cks commented Mar 22, 2022

@piotrbak I encountered the same problem. My website has about 16w product pages. I use the editor of elementor pro. I checked wp_wpr_rucss_used_css in woocommerce. Now it has only been about a week and it has a size of 14G. Although I am a dedicated server , but this can not support long-term operation, I am very worried that the server will not be able to withstand the crash at any time, can you solve it as soon as possible or give a solution, I also use the compressed mobile phone side of wp-rocket and the useless css of the computer side and js, I have been looking for solutions everywhere recently, but so far, I don't seem to have found a solution!

@DahmaniAdame
Copy link
Contributor

DahmaniAdame commented Mar 22, 2022

Possible fix:

The problem might be prevented if we stored the Used CSS in files instead of the database. 

Styles will still be added inline to the page like it's currently done. It will just be fetched from a file instead of the database by refactoring this function to look for if a page's Used CSS exists, and fetch its content if it does -

private function get_used_css_markup( UsedCSS_Row $used_css ): string {
$used_css_contents = $this->handle_charsets( $used_css->css, false );
return sprintf(
'<style id="wpr-usedcss">%s</style>',
$used_css_contents
);
}

Using naming conventions for files (like we do for cache), we won't need to connect to the database to find and fetch the content (less overhead on the database). 

On servers that support Gzip (most servers nowadays do), we can consider storing the Used CSS as gzipped files directly. It will give larger gains in disk space vs. the full size stored on the database. 80% size gain on average. But it will have an overhead to uncompress the css.gz to extract the content.

@DahmaniAdame
Copy link
Contributor

Related - https://secure.helpscout.net/conversation/1824377026/333287/

Used CSS average 200kb. 
​35k items. 
​That's 7Gb worth of data on the database.

@piotrbak piotrbak added priority: high Issues which should be resolved as quickly as possible needs: grooming labels Mar 24, 2022
@alfonso100
Copy link
Contributor

related - https://secure.helpscout.net/conversation/1837014809/335667
Used CSS is 150-180kb
6300kb posts,
around 1.2gb of database storage

@vmanthos
Copy link
Contributor Author

Related: https://secure.helpscout.net/conversation/1851789174/337990/

Used CSS is ~ 0.85MB/URL and when I checked ~660 rows were there in the table which weighted ~560MBs.

@vmanthos
Copy link
Contributor Author

Related: https://secure.helpscout.net/conversation/1844337435/336865

The site has over 20k posts and the used CSS is about ~100kb for each URL.
If the used CSS is created for all URLs it would occupy ~2GB.

@alfonso100
Copy link
Contributor

related: https://secure.helpscout.net/conversation/1851953603/338040?folderId=2683093

Used CSS is ~320kb
2000 posts + Separate cache for mobile devices
around 1.25gb of database storage

@NataliaDrause
Copy link
Contributor

@alfonso100
Copy link
Contributor

@adnanwk
Copy link

adnanwk commented Apr 26, 2022

There has to be an expiry of some sort to discard cache of for those posts which are not active. Probably it should include a setting for number of days from the last-viewed time.

Mine is a current affairs website. For me, 7 days is a good enough time. But there are sites which aren't updated so frequently, and their contents remain relevant, they may desire a much longer period.

@piotrbak
Copy link
Contributor

@Tabrisrp whenever you're available, we need to think about the moving data to filesystem here. Of course, using compression where we can, etc. Let's have a discussion about the approach we'll take.

@vmanthos
Copy link
Contributor Author

Related ticket: https://secure.helpscout.net/conversation/1864520160/340083/

There are ~6.500 URLs and the database size is ~1.5GB.

@webtrainingwheels
Copy link

https://secure.helpscout.net/conversation/1866269415/340397?folderId=377611
7900+ URLs - ~2.2 Gb in the database

@DahmaniAdame
Copy link
Contributor

@engahmeds3ed
Copy link
Contributor

I think we have two options to think of here to possibly solve this issue as follows:-

  1. Before saving the css into the DB we can combare its hash with the other rows' hash and only save something like row::ID replacing the ID with the reference ID and save also the real hash then when clearing the reference row, this will lead to remove all rows with the same hash. This will prevent any duplicate rows in DB but has some risks that we need to think of before implementation.

  2. Move the used css storage from DB to filesystem so in the CSS column we will have the filesystem path

@DahmaniAdame
Copy link
Contributor

DahmaniAdame commented May 1, 2022

@engahmeds3ed option 1 will still fail to fix the issue in sites with high number of pages with little ratio of duplicates.

Option 2 is more efficient and would fit most shared hosting environments where file storage is unlimited and database storage has usually a cap.

It will also give the flexibility to store CSS gziped when possible, which should take less space.

The hash part is still valid. Maybe it could be used for the stored CSS files names to avoid duplicates?

The hash will be referenced on the Used CSS table to manage the interaction with SaaS and picking the right used CSS files to inject.

This will ultimately result in storing less Used CSS files. And will also result in storing less on the used CSS tables as the hash will weight less than the full filesystem path.

@Tabrisrp
Copy link
Contributor

Tabrisrp commented May 3, 2022

Some thoughts:

  • Storing on the filesystem seems to be the way to go. We could add back the used-css folder in which we would store the files
  • The files can be named using a hash generated base on the content, which would allow to re-use the same file for different URLs that have the same used CSS
  • In the DB, we replace the CSS content by the path to the corresponding used CSS file
  • On update to the version we implement this change, we will have to truncate the table to reset it
  • Compressing the files will save space, it is worth the trade-off with the processing time
  • How should we handle clearing of the files, for example if a file is used for multiple URLs?

@vmanthos
Copy link
Contributor Author

vmanthos commented May 4, 2022

@piotrbak
Copy link
Contributor

piotrbak commented May 4, 2022

@Tabrisrp In terms of hashing the filenames, we'd be storing them all in the same directory. If we had 70k or 100k CSS files there, wouldn't it slow down operations on those files?

@CrochetFeve0251
Copy link
Contributor

CrochetFeve0251 commented May 5, 2022

@piotrbak using the url as a path the same we do with cache can solve the problem? (I saw we got the url inside the table we can use for that)

@piotrbak
Copy link
Contributor

piotrbak commented May 5, 2022

@CrochetFeve0251 Yes, but then sharing the same CSS file between different posts will be a bit harder/misleading

@CrochetFeve0251
Copy link
Contributor

CrochetFeve0251 commented May 5, 2022

@piotrbak how about a tree system.
First we search the hash file at the first level for common to all pages or inside homepage.
If it is not there then we search a level lower the hash if it is common to that group of page. (Normally common css is between group of page if it is not the template for exemple the blog or the products or the categories)
If it is not there we go a level lower again and search if the hash is here until we arrive to the level of the page that contain the css only if it is unique to the page.

Example:
URL: /en/blog/test
First we search at the / folder the hash.
if it is not we search in /en.
Then if it is still not there we search in /en/blog.
And finally if it is unique to the page it will be in /en/blog/test.

@DahmaniAdame
Copy link
Contributor

How about we create subfolders from the hash itself?

Let's say this is the hash of the generated RUCSS - 5960d662a60f0fc3761434de3559caef

We can use the left characters as sub-folders.

The file structure would be something like: wp-content/cache/rucss/5/9/6/0d662a60f0fc3761434de3559caef

We can add as many subfolders as needed depending on how many pages the website has or using a filter.

The folders won't always have the same files count, but it can avoid putting everything on the same folder without over complicating things to manage it.

@viobru
Copy link
Contributor

viobru commented May 6, 2022

@vmanthos
Copy link
Contributor Author

vmanthos commented May 9, 2022

@vmanthos
Copy link
Contributor Author

vmanthos commented May 9, 2022

@alfonso100
Copy link
Contributor

@Tabrisrp
Copy link
Contributor

Scope a solution ✅

In Engine\Optimization\RUCSS\Admin\Subscriber, add a new method to truncate the used CSS table when updating to the version adding this change.

Create a new Engine\Optimization\RUCSS\Controller\Filesystem class, that will manage the paths and reading/writing/deleting the used CSS in the files. It will also manage the creation of the cache/wp-rocket/used-css/ folder for the current website if it doesn't exist.

This class will be used as a dependency for the Controller\UsedCSS class.

Update the Controller\UsedCSS class to use the filesystem when saving/getting the used CSS.

Add/Update tests to match all the changes.

Estimate the effort ✅

Effort [M]

@Tabrisrp Tabrisrp added effort: [M] 3-5 days of estimated development time and removed needs: grooming labels May 10, 2022
@Tabrisrp Tabrisrp self-assigned this May 10, 2022
@NataliaDrause
Copy link
Contributor

@piotrbak piotrbak added this to the 3.11.4 milestone May 22, 2022
@viobru
Copy link
Contributor

viobru commented May 23, 2022

@alfonso100
Copy link
Contributor

@bwafels
Copy link

bwafels commented May 23, 2022

would it be possible to have this fix in 3.11.3? My host is giving me a temporary free database upgrade.

@NataliaDrause
Copy link
Contributor

@NataliaDrause
Copy link
Contributor

@alfonso100
Copy link
Contributor

@NataliaDrause
Copy link
Contributor

@vmanthos
Copy link
Contributor Author

@dbarproductions
Copy link

This table is 46GB on my client's site that has over 100,000 WooCommerce products in it. Even the postmeta table for that many products is only 2.4GB. I only just discovered this issue when trying to clone the database to my local dev environment (they installed the WP Rocket plugin on their own) and saw how huge it was. It's going to use all their disk space very quickly with the nightly backups.

@bwafels
Copy link

bwafels commented Jun 15, 2022

@dbarproductions My host also makes daily automatic DB backups. Normally they keep 14 days of backup but I see that in my case they only have 7 days. They probably are trying to keep it within the limits that way for a little longer.

@DahmaniAdame
Copy link
Contributor

@dbarproductions + @bwafels the issue will soon be fixed as storing the used CSS will move from the database to the filesystem.

@worldwildweb
Copy link
Contributor

@exonianp
Copy link

exonianp commented Jul 13, 2022

For a site with 10k products (of 50 attributes) the table is 5GB.

I do not think that creating CSS for each and every page/product does the trick, regardless of the storage method. Obviously when in filesystem we will not have the increased memory requirements that we are now facing. Our site, consumes 15GB of RAM even when "idle" - thanks to your huge table that is 8 (eight) times the size of the rest of the content.

What you have to do gentlemen is to follow TagDiv's lead and just create ONE optimal CSS for each page template NOT ONE PER SINGLE PAGE. eg. ONE CSS file for the product page, one for the category page, ONE for the homepage etc. Assuming obviously the "worst case scenario" e.g. a page that included related products, cross-sells, upsells etc. and just apply this ONE template to all product pages.

If unsure, just allow the end users to run the optimizer manually to their... most complex product page, as tagdiv does per template. I really love to use your great work in CSS optimization in another project. Sadly is over 150k products and your architecture makes it prohibitive.

Other than that just keep up the good work in offering the best optimization in the market!

@GeekPress
Copy link
Contributor

What you have to do gentlemen is to follow TagDiv's lead and just create ONE optimal CSS for each page template

Sorry but it's a no-go ;)

Why?
Each page can be totally different by using Gutenberg with a ton of different blocks between pages. It's exactly the same with pages created with Page Builders.

Check our website for a very good example. All our pages are totally different and don't contain the same blocks, and so they need a different Used CSS for each of them.

By the way, the issue is fixed since WP Rocket 3.11.4.

@wp-media wp-media locked and limited conversation to collaborators Jul 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
effort: [M] 3-5 days of estimated development time module: remove unused css priority: high Issues which should be resolved as quickly as possible type: enhancement Improvements that slightly enhance existing functionality and are fast to implement
Projects
None yet
Development

Successfully merging a pull request may close this issue.