New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to cache GET (Query String) requests, but ignore the parameter value possible? #639
Comments
@lkraav Requests that include a query variable (e.g., You could enable GET Request caching (ZenCache → Plugin Options → GET Requests) and tell ZenCache to cache those pages anyway, however if you have a lot of variations of the same URL (e.g., ZenCache has no way of knowing what a "default value" is for a page with a query string... i.e., there's no way to tell ZenCache "if the URL contains |
Correct, this is exactly the problem. Caching even hurts performance here (probably near-negligible but still), because the page will always be built from scratch, and additionally the useless cache output will be built.
I'm thinking you're overly pessimistic here. Certainly a configurable query parameter parsing algorithm can be devised that will arrive at the correct conclusion about which cache file to build or serve. CloudFlare for example has the "ignore query string" caching feature built in https://support.cloudflare.com/hc/en-us/articles/200168256 even though, yes, by their own omission they get their cache result from origin by stripping the query parameter. But there's no reason why this can't be replicated locally. If the decision machine would have a configurable list of query parameters to ignore, it could actually strip those from the incoming request when making the final routing decision about the cache file. This actually enables other query parameters to still stay around and cause a non-cached reply, if needed. But if let's say AdWords and other similar systems are in massive use. You can surely imagine the wasted global resource here. Any arguments against the algorithm proposed above? EDIT I have one argument: another textarea in the seemingly endless river of Configuration textareas :) But perhaps this is a small price to pay for gained global savings? |
That sounds doable, yes. A new textarea inside the GET Requests panel that allows you to list query parameters that ZenCache should consider "safe to cache", and if ZenCache receives a request from a URL that contains only that query parameter, it could silently drop it and cache the page as if the parameter wasn't present. It does seem like a rather odd use-case, and prone to accidentally creating a static cache of a page that should have been dynamic (if the site owner does not properly configure the exclusion), but from a ZenCache perspective it doesn't seem that difficult to do. Again, my feeling is if you have a query string present, you probably want to use it. If you don't want to use it, then why not simply exclude the query string when linking to the page? @jaswsinc any feedback here? |
I would propose that the algorithm should not consider the subject caching strategy when it's the only query parameter. There seems to be no reason why there couldn't be an arbitrary number of query parameters in an incoming request working out with this enhanced decision strategy. ZenCache would
For this example, |
I agree that it would be nice if ZC could be configured to strip specific query string variables from it's caching algorithm. Your example, Google Analytics, is a use case where @raamdev writes...
I give this a 👍. I think it's a good idea to allow for a configuration where you could define query string variables to exclude. I think that even warrants a new UI config panel in ZenCache myself. Aside: In addition to this, I recently observed that CloudFlare has an algorithm that can sort the query string variables by key, ordering them in the same way each time, before they calculate the cache location for that page. This is something we could consider doing also. Example query string:
... is always sorted, and then represented internally as:
So that if you happen to run ZenCache w/ GET requests enabled (not recommended), but if you do, ZC could be smarter about how it determines the cache location. Sorting the variables removes most of the permutations that would need to be considered otherwise. |
Short-Term SolutionA decent short-term solution is to add the So if you have a URL which contains query string variables, and you still want it cached, add that onto the end of the URL and it tells ZenCache that it's OK to do so.
|
I was thinking over this suggestion, and believe there's a flaw to the approach of designating certain query parameters to be cached as well as @jaswsinc I don't think the optimum solution is allowing individual query string url variations to be cached as separate files. Instead, I think you should serve the non-GET/non-query string version of the cached file that's on disk. As it stands with the current set of suggestions, if a person visited: ZenCache will create static cache file.
But if a person visited the following:
ZenCache would create three more cache files here:
But that's only workable if those query string URL's are common to many distinct visitors. If many people are hitting those same URLs. For instance, it would be fine for something like:
You'd want those to be cached, as many people will be hitting those exact same URL's. But with The difference between the two scenarios is the "product_id" approach is controlled by the website owner/admin, and is predictable. The For instance, here's how google adwords describes "utm_term":
Allowing such pages to be cached could result in an explosion of one-off cache files all containing the exact same HTML content. So, I propose pushing it further. For certain designated query string parameters, do not create a So, a visit to any of these pages:
Would only serve this file:
On top of that, it would be great if the "filter" could be an "any" filter. So, if I specify to serve the main cache file version if a query string contains "any" of the following:
The single, primary cache file will be served, even if the url contains a bunch of other nonsense parameters. Because, who knows what extra stuff can end being tacked on. Or think of it this way: If "any" of the specified query strings exist in the URL, strip the entire query string (including hashes/anchors) and just serve the non-query string version. Because, if I'm running an ad campaign, the tracking of those query strings are being done asynchronously, via JavaScript. And the single, main cached file version would suffice for any and all URLs, query string, or no query string. I think this would be the majority use case, but if there are those that actually need separate Thanks. |
You're correct @jaswsinc, Either way, everything written above (although some concepts are repeated by now) is correct. I like the alphabetic ordering idea - that's indeed the optimal way to go about it. |
@bridgeport writes...
Right. That was my understanding about how this would work: You would have "Query Vars To Ignore" section inside the GET Requests panel that would allow you to define a series of query variables that should be ignored (stripped) from the request prior to caching or loading an existing cache file. @bridgeport writes...
That's an interesting idea and I agree that should be possible. We could probably achieve this by using the Watered-Down Regex Syntax (recently implemented in other areas of the plugin) inside the new "Query Vars To Ignore" panel: For example, to remove only the
Whereas if you want ZenCache to strip all query vars from the request whenever it finds
Putting those two together, the "Query Vars To Ignore" box might contain this:
@jaswsinc writes...
That sounds like a great idea! |
How's this doing? My clients are still struggling with caching campaign pages properly. It would be a useful incremental win if simply a query-string free cached version could be opted into serving, until the more complex dynamics described in above messages can be fully implemented. |
@lkraav Thank you for the ping here. I'll see if we can work this into the next development cycle (the current cycle ends today and enters a 1 week testing phase). @jaswsinc writes...
I'd rather put this inside the existing GET Requests panel for now. If you could outline this one when you get a chance, that would be great! 😁 |
How are we doing on this for fall release? The number of sites where I could use this keeps increasing. |
OutlineNew OptionAfter this line add: 'exclude_get_vars' => '', // Comma-delimited list of query string variables. New ConstantAfter this line add: if (!defined('COMET_CACHE_EXCLUDE_GET_VARS')) {
/**
* HTTP GET variable exclusions.
*
* @since 16xxxx GET variable exclusions.
*
* @var string A comma-delimited list of query string variable names.
*/
define('COMET_CACHE_EXCLUDE_GET_VARS', '%%COMET_CACHE_EXCLUDE_GET_VARS%%');
} Refactor Conditional UtilityRefactor this method so that it considers Refactor Cache Path GenerationReplace this line with code that can parse the query string variables (see: New UI OptionA new |
@jaswsinc What do you think about supporting WREGX in the query var exclusions? |
@lkraav writes...
We'll work on getting this into the next release. :-) |
@raamdev writes...
Sounds like a good idea to me also. |
Thanks a ton guys. Maybe to ease the workload for you, this could be a developer-only feature to start, without a UI. This allows some testing, calibrating etc. before spending time on bolting on a full user-friendly interface. |
PS if there would be a shared branch patch queue to work with and test, that wouldn't have any other modifications (avoid breakage), I could test this feature in a few real world environments very quickly, and possibly even provide patches based on what I see. Not sure if the risk is appropriate piling too many "in-development" changes on at one time in live environments, hence would be more comfortable with a well contained feature branch. |
@lkraav Thanks for your willingness to contribute! Once work on this issue is underway, you'll get a notification here about the new feature branch and you can follow along with the Pull Request. |
PS looks like there's more people thinking about sorting GET parameters for caching https://github.com/wandenberg/nginx-sorted-querystring-module |
@lkraav Thanks for the heads up! This issue didn't quite make it into the current development cycle (which closed last week in preparation for an official release this week), but this issue will be on docket for the next release. :-) |
Oh shoot. I just launched a site where this is really needed. I think I'm going to poke around in the codebase and see if I can come up with a hold me over patch. |
@lkraav Sorry we couldn't work this in sooner. Jason's outline above is a great place to start if you want to give this a stab on your own. If you'd like to submit a working PR to the Comet Cache Pro repo, that would help us get this worked into the next version more quickly. :-) |
Perfect
|
@lkraav The build environment takes some time setting up. It would be easier to just work from the latest public release of Comet Cache Pro (or the latest RC release), modifying files and testing as you go. |
Work on this issue has begun. |
Next Release Changelog:
|
Fantastic, guys 👍 Going to put this through some good testing cycles shortly. |
Confirmed Working
Set GET Variable to ignore Base Page : Visiting Visiting |
Comet Cache v161119 has been released and includes changes from this GitHub Issue. See the v161119 announcement for further details. This issue will now be locked to further updates. If you have something to add related to this GitHub Issue, please open a new GitHub Issue and reference this one (#639). |
I would like to cache AdWords campaign responses which will receive tons of people with
?gclid=<id>
requests where<id>
is an endless amount of random hashes. It makes no sense to cache these, because essentially they're one time use. So I would need to return a response that's general for justgclid=
parameter, regardless of the exact value.http://zencache.com/kb/ doesn't seem to indicate it's possible at the moment. Am I missing anything?
The text was updated successfully, but these errors were encountered: