-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full Page Cache issues #606
Comments
I just found another issue - and this one is big (description removed as it's security related --Traumflug) |
Please edit your comment so that one can't find out how to reproduce this issue until it's fixed. (description removed as it's security related --Traumflug) |
For those who don't follow commits closely, @getdatakick contributed an entire series of commits, which should fix the security issue as well as some of the issues mentioned in the first post here. These commits are: |
Another issue is related to hook parameters. Some hooks expects parameters, for example When FPC executes non-cached hook, it just calls it without parameters. That, of course, starts chained reaction of errors. I have no idea how to fix this properly. We can have some helper function for every hook that expects parameters. This function could create a context, and call hooks with the expected parameters. But that would, of course, not work for custom hooks (some modules register their own hooks) |
These hook placeholders in cached HTML could also store hints for required parameters. Before investing more efforts into these issues I'd love to see some performance measurements. How much faster is such a full cached page actually, especially on front, category and product pages? How much disk space require all the cached pages in all their variants (didn't investigate this yet, but with 10.000 products we might have millions of cache files)? How often do we get cache hits without cache warming? Is cache warming actually possible? Merchants are pretty keen on this Full Page Cache, so playing with the idea to ditch it means we need solid numbers showing that it's not reliably doable, doesn't give the expected performance improvements and/or there are other opportunities giving equally fast / even faster page loads. |
Its not so much about speed, it is more about the number of users it increases the websites load capacity to. Notice how the full page cache took the number of users from 270 to 563 before the website became unresponsive. https://thirtybees.com/blog/thirty-bees-caches-tested/ It does help with normal page loads as well. I maybe should do a test on that as well. |
The question is if that's enough. For things like product object or array of product objects, it's probably doable. We know how to create these objects, so we can indeed store some hints. But what about scalar values? There is no way (in general) to recreate them. We can, of course, store those scalar value directly inside template. But is that correct? It will probably work in like 95% of the time, as those scalar values are probably derived from the rendered entity, but not always. As an example, imagine you want to display random product on your home page. You will use some hook to render this product, and this hook takes product id as parameter. To do this, one would put something like this into homepage/index template: {hook h='displayProductSummary' productId=$randomProductId} Now, if this hook is cached/green, the home page will obviously always display the same product. Not random. If we make it non-cached, then we would store $randomProductId value inside rendered template, and then on every page render we would call I know this is weird scenario and that nobody would probably implement this functionality this way. Anyway, I think it demonstrates the problem with this approach nicely. There's really not much we can do about this, because |
So I took my default installation, turned off debugging, turned on profiling and tried cache settings. These screenshots taken for loading product page "Honey", reloaded three times with some 5 seconds. Server side cache off, full page cache off:
Server side cache on (filesystem, depth 1), full page cache off:
Server side cache on, full page cache on, all hooks red:
Server side cache on, full page cache on, all hooks green:
As one can see, FPC gives no advantage on the first load and just 10% with hooks served on subsequent loads. Only skipping all the hooks gives a significant advantage. Before measuring on the command line I tried with the built in profiling tool, but this apparently doesn't work with FPC on. |
These numbers are what would one expect, for a test in isolation. The question is how this behave under heavy load. I'd assume the numbers will be quite different, since FPC will bypass the DB. @Dh42 you have some experience in this kind of testing, right? |
I do, that link I posted, https://thirtybees.com/blog/thirty-bees-caches-tested/ I actually tested a 2gb vultr vps using locust.io I didn't test the initial loading, I just tested the ability for handling high loads. Since the database is by passed it really effectively doubles what the server can handle before falling offline. |
One can do this with 20 parallel requests as well, with about equivalent results. Script: Server side cache off, full page cache off:
Server side cache on (filesystem, depth 1), full page cache off:
Server side cache on, full page cache on, all hooks red:
Server side cache on, full page cache on, all hooks green:
|
Taking my simple measurements into account, it looks more like the performance improvement isn't a result of bypassing the database, but of bypassing all the hooks. At the time of your measurements, hooks were off by default (red). Now green means off. |
That could be the case, but I did a full test, hitting pages that would make database requests, I did not just test the home page in the tests, I tested the home page, a search page, the category page, and the product page. So at least on the search page database requests were made. |
Can I get an update on this bug from @getdatakick and/or @Traumflug ? I'm a little bit unclear on if full page caching is usable now. I can volunteer some testing if it would help. The 1.0.7 release notes mention that full page cache is working, however, this bug is still open. From the conversation here it seems that only the security issue was fixed, while more problems remain. As far as the importance of full page caching goes, I will show some results from tests run on a site I'm working with: A particularly bad product page A category page Front page Preliminary testing for us shows massive results. Again, I can do more tests if it's helpful. It's worth mentioning that these tests were done on a development site with little optimization though, and FPC has more of an effect since it's the only caching or optimization in place. |
It's still broken, it gives 500 on some pages. It's not stable and it's not recommended to use it. |
As therampagerado said, it gives 500 error in some situations. |
The problems with parameter passing should be fixed by aaa1730 |
I'm submitting this issue to present the results of my investigation into full page cache
Tasks
How full page cache works
When full page cache is enabled for some page (controller), the content of the page is stored
in the cache, and is served to all visitors with the same $cacheKey.
We can also choose which display hooks should be considered static, and which are dynamic.
If we make some hook dynamic (green), then the hook output will be wrapped inside html
comments like this:
When we serve cached version of the page, all dynamic hooks are executed, and up-to-date content is injected into the html page using these comment delimiters.
Cache Key
Cache key consists of these parts:
utm_source=
can return the same cached version (these parameters are used by google analytics javascript code only)Problems
While this somehow works, there are many problems that should / need to be addressed
Cache key
The first problem is in cache key. For example, it does not contain customer group. So if we have different prices for different groups, we can server wrong content! What else should be part of cache key?
Another small issue is with implementation itself - it's quite brittle, implemented on two places slightly differently. This should be consolidated.
hook name in delimiter
The html comment delimiter contains hook name, and this causes problem when we replace the content with fresh one. The reason is that hooks can have aliases, for example
displayHeader
have aliasHeader
.So, in the html content, there can be
but the code that is performing replacement is looking for
This does not match, so thirtybees will return stale/original version of hook output
All green hooks are always executed
This is quite stupid. When we are serving cached version, we go over all green / dynamic hooks, and execute every one of them (even if it was not used in the cached paged).
Not only is this a waste of resources, it can also create various strange bugs. For example, thirtybees can execute hook
displayPayment
on category page, and this can veryeasily result in some 500 error code, because developers expected this hook to be executed only in context of Order controller.
To fix this problem, we should somehow parse the cached page, and retrieve list of hooks to be refreshed. And execute only those.
Merchants have no way to tell which hook should be cached or not
This is a big problem.
Merchants have no clue if the hook content should be cached or not. The only way to tell for sure is to look into the code, and determine if the function is pure or not (if we use functional programming terminology).
What makes this even worse is the fact that all hooks are by default considered pure -- they are all static/red.
To fix this, we should have some additional metadata, so the module developers could tell tb engine how to treat each hook. It should be responsibility of module developers to decide if the hook content should be cached or not.
Obviously, this would work for with native modules only. Legacy / ps16 modules would not be aware of these metadata, and merchant would still need to decide. In this case, it would be best if all hooks would be dynamic/green by default.
Side effects
This is another big problem. Display hooks can have side effects, such as registering new css/javascript files, adding javascript definitions, etc. Example:
Even if we mark this as a dynamic/green hook, it will not work as expected. This hook does not produce any output, it only registers new javascript file. These files are added to html output elsewhere -- but that never happen when we serve cached content.
It's very hard to fix this.
Colors
Maybe it is only me, but I would expect that green means that the hook is cached, and red means hook that is executed on every page request.
The text was updated successfully, but these errors were encountered: