Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Important option missing: Auto-Purge "RSS Feeds" Too? #182

Closed
sarangandk opened this issue May 29, 2014 · 32 comments
Closed

Important option missing: Auto-Purge "RSS Feeds" Too? #182

sarangandk opened this issue May 29, 2014 · 32 comments
Assignees
Milestone

Comments

@sarangandk
Copy link

In quick cache Version 140520..

As there options like "Auto-Purge "Author Page" Too?", Auto-Purge Designated "Home Page" Too? etc. a very important option is missing.

That is an option to set to refresh/clear the caches for RSS Feeds when making any changes in posts, adding new posts etc.

Because I see when I add new posts to WordPress, the RSS feed caches are not cleared automatically and we need to clear that manually everytime.

In my sites I have authors, who have not access to Quick Cache settings to clear the caches when they add new posts.

Please add that option ASAP.

Thanks in advance.

@raamdev raamdev added this to the Future Release milestone Jun 2, 2014
@raamdev
Copy link
Contributor

raamdev commented Jun 2, 2014

@sarangandk Thank you very much for reporting this important bug! The development window has closed on the current release, but I'm going to mark this is a priority item to work on for the following release (after the one coming out this week, for which development has already been halted).

@raamdev
Copy link
Contributor

raamdev commented Jul 13, 2014

I'm looking at this again and it looks like this feature is actually going to be a bit challenging to implement. That said, I do agree that this is a priority item and needs to be added ASAP. In fact, it's as important as the auto-purge posts page and home page feature, as the publishing of a new post would currently render cached feeds out of date.

Feed Caching is disabled by default in Quick Cache, but if a site owner enables this feature, it is currently broken, as it will result in outdated feeds.


Scenarios where Feed Caches should be purged

Auto-Purge Comment Post Cache

When auto_purge_comment_post_cache() or auto_purge_comment_transition() are called, they should also purge any cached comment feed associated with that post, e.g., http://example.com/sample-post/feed/, along with the site-wide comment feed, e.g., http://example.com/comments/feed/.

Auto-Purge Home Page Cache

When auto_purge_home_page_cache() is called, the primary site feed cache should be cleared, e.g., http://example.com/feed/.

Auto-Purge Posts Page Cache

When auto_purge_posts_page_cache() is called, the primary site feed cache should be cleared, e.g., http://example.com/feed/.

Auto-Purge Post Terms Cache

When auto_purge_post_terms_cache() is called, the feed cache associated with each term should be cleared, e.g., http://example.com/category/Uncategorized/feed/ and http://example.com/tag/mytag/feed/.

Clearing the term cache will actually be quite tricky because you can build multi-term feeds on the fly. For example, to create a feed that includes all posts with tag1 and tag2, you could simply access http://example.com/tag/tag1,tag2/feed/. That currently generates a cache file at example-com/tag/tag1-tag2/feed.html. The purging routine would need to consider this when purging the cache for a given term.


Because this is going to require quite a bit of work, I'm going to push this issue to the Future Release milestone, so that I can push out in the Next Release all of the bugfixes that are ready to go out right now.

This issue will be a priority item for the next release cycle.

@raamdev raamdev modified the milestones: Future Release, Next Release Jul 13, 2014
@jaswrks jaswrks self-assigned this Aug 6, 2014
@jaswrks
Copy link

jaswrks commented Aug 6, 2014

Assigning this to myself.

jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 6, 2014
@jaswrks
Copy link

jaswrks commented Aug 7, 2014

@raamdev If auto_purge_post_cache() is called upon, what should be cleared then in terms of the feeds? It seems like this should clear the sitewide feed; i.e. /feed/, and any associated term-related feeds too. Do I have that right? Missing anything for that circumstance?

I think we need to consider author-related feeds too?

jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 7, 2014
@jaswrks
Copy link

jaswrks commented Aug 7, 2014

OK, I have the ones you listed covered now.

I am also adding feed purging for auto_purge_post_cache(), to include:
  • The sitewide feed; i.e. /feed/
  • Term feeds associated with the post.
  • Author feeds associated with the post.
I am also adding feed purging for auto_purge_author_page_cache()
  • The sitewide feed; i.e. /feed/
  • Author feeds associated with the post ID passed to that routine.

I have the event handlers in place now to deal with these scenarios. What is left is to actually write the central routine that deals with feed purging. Before I start on that, can you please help me review this and let me know if there is anything I am missing?

It might be worth grabbing this branch and running a search for auto_purge_xml_feeds_cache to see if you spot anything that is missing from the current event handler configuration. Of course we can always come back in later, but if you don't mind having a quick look it would help me out.

@raamdev
Copy link
Contributor

raamdev commented Aug 8, 2014

can you please help me review this and let me know if there is anything I am missing?

I have reviewed the code you committed to the feature/182 branch and it looks good. All of the handlers look correct and I don't see anything missing.


Regarding this comment in wpsharks/comet-cache-pro@1ecaf51:

// @TODO @raamdev I think it would a good idea to reverse the logic here to match other routines.
//    Perhaps we could check for a negated condition and stop before proceeding; instead of the other way around.

I agree. Feel free to flip that around as part of this issue if you'd like.

@jaswrks
Copy link

jaswrks commented Aug 8, 2014

Roger that. Thanks!

jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 8, 2014
jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 8, 2014
jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 8, 2014
jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 8, 2014
jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 8, 2014
jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 8, 2014
@jaswrks
Copy link

jaswrks commented Aug 8, 2014

jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 8, 2014
@jaswrks
Copy link

jaswrks commented Aug 9, 2014

@raamdev PRs submitted for review. My tests indicate this is working correctly in all cases. However, I would suggest further testing by another pair of eyes. That said, I feel it's ready to be merged into the dev branch. Here a couple of screenshots taken from the pro version.

2014-08-09_01-46-39

2014-08-09_01-47-03


There is one minor limitation that I noted here. It is related to mixed feed variations for term-related feeds that are using query string variables.

While this implementation does support feed URLs that use query string variables (in case a site owner is caching feeds, and also caching GET requests) that will be fine. This routine can handle it. However, mixed feeds (e.g. ?cat=123,456) cannot be fully analyzed by the purging routine since QC uses an MD5 hash of the query string variables. For this reason, the consideration for mixed feeds is limited to SEO-friendly feed URLs; which I noted in the source code.

@jaswrks
Copy link

jaswrks commented Aug 9, 2014

@raamdev This has been enabled in both the lite/pro versions. However, there is no ability turn off feed-related Dashboard notices for feed purging, and there is no ability to disable feed purging either; in the lite version.

Actually, if feed caching is disabled in the lite version, feed purging will not run. That would be one way to do it in the free version; even though there isn't a separate config option for disabling feed purging in the lite version.

@raamdev
Copy link
Contributor

raamdev commented Aug 9, 2014

This has been enabled in both the lite/pro versions. However, there is no ability turn off feed-related Dashboard notices for feed purging, and there is no ability to disable feed purging either; in the lite version.

Actually, if feed caching is disabled in the lite version, feed purging will not run. That would be one way to do it in the free version; even though there isn't a separate config option for disabling feed purging in the lite version.

Perfect. That's exactly what I was thinking would be the difference between Lite and Pro with regards to this feature. Thank you.

PRs submitted for review. My tests indicate this is working correctly in all cases. However, I would suggest further testing by another pair of eyes.

Thank you. I will review these today.

jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 9, 2014
@jaswrks
Copy link

jaswrks commented Aug 9, 2014

Further testing shows a possible issue.

  • Purging feeds on a site that is using SEO-friendly permalinks, does not pick up feed URLs that use the alternative query string format at all. Hmm, that is something we could look at either now or in the future. Not sure how @raamdev feels about this.

On the one hand, feed URLs probably should not be cached at all. If they are, GET requests shouldn't be cached. Therefore, I don't see cached feed URLs with a query string being a common occurrence at all; i.e. this is an edge case in my view.

On the other hand, it would be nice if QC could cover this format, even when the permalink structure is set to one of the SEO-friendly options. Just in case a site owner is using SEO-friendly permalinks, is also allowing GET requests to be cached, and they are also allowing feed URLs to be cached that use the query string variations. Not sure why that would that ever occur. As I said, edge case. lol

As it stands now, QC will handle feed URLs using the query string format, but only if SEO-friendly permalinks are NOT in use. i.e. if the WP core function get_feed_link() returns the query string format because the site is not using a permalink structure. That's the only time this purging routine will look at the query string formations; e.g. ?feed=rss2

To clarify further, the purging routine is looking at one format or the other, based on the way WordPress has been configured. If WordPress is configured with an SEO-friendly permalink structure, we look at URLs like /comments/feed/. If WordPress is using default permalinks, we look at feed URLs such as /?feed=rss2. Under normal circumstances, these wouldn't be cached anyway, because it's generally a bad idea to cache URLs with a query string. The option to cache GET requests is disabled by default in Quick Cache and is generally discouraged.

@raamdev
Copy link
Contributor

raamdev commented Aug 10, 2014

I agree that we should support SEO-friendly permalinks + auto-purging for feed URLs with query strings (e.g., /?feed=rss2) when GET Request caching is also enabled.

I've seen query strings used quite frequently with regards to WordPress feeds, especially if a site owner has done any customization to the way feeds behave.

For example, on my personal site I have it setup so that I can request a feed of all posts with format "Aside" using /feed/?rss-post-format-aside=true. It may even be true that some WordPress plugins that change the behavior of feeds also use query strings (e.g., plugins that provide a "protected" feed use a query string to pass the feed "key").

All of that said, I'm quite happy to treat this as a separate "bug", to work on for a future release. I'm more concerned with getting feed purging for SEO-friendly URLs pushed out (which is likely a far more common scenario).


To recap:

The scenario that we need to address is the one where the site owner is using SEO-Friendly permalinks, has GET Request caching enabled, and is using Feed URLs that contain query strings.

In that scenario, the cache files generated for those Feed URLs (with a query string) would not be purged by this new auto purge routine and instead would only get purged when they expire or when the cache is manually cleared.


I think the best solution here would be for the feed purging routine to simply look for both SEO-friendly cache formations and query string formations when GET Request caching is enabled (that's the only scenario where both types might get cached).

When GET Request caching is disabled (the default), the feed purging routine should only consider SEO-friendly cache formations (which is what the new routine currently takes care of as of this comment).

@jaswrks
Copy link

jaswrks commented Aug 10, 2014

For example, on my personal site I have it setup so that I can request a feed of all posts with format "Aside" using /feed/?rss-post-format-aside=true. It may even be true that some WordPress plugins that change the behavior of feeds also use query strings (e.g., plugins that provide a "protected" feed use a query string to pass the feed "key").

Yes, I agree with this too. I also realized that when/if this occurs, the only they would ever be cached (i.e. the query string variations) is if the site owner is also allowing GET requests to be cached too. If they are using SEO-friendly permalinks, there is very little reason to allow GET requests to be cached. Making this case even less likely to occur.

I would agree with treating this as a separate issue; i.e. as an improvement to come later. What I can do in this issue, is add the query string variations for those which are the most common. Still, I think a separate issue could be created so that we can come back and take a much closer look at this specific scenario again in the future.

I will post another commit or two soon to help close the gap a little.

jaswrks pushed a commit that referenced this issue Aug 10, 2014
jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 10, 2014
@jaswrks
Copy link

jaswrks commented Aug 10, 2014

Found a minor bug in build_cache_path() in this last round of changes. I don't think it has any impact on the current release. However, with these feed URLs, WordPress generates them with &amp; instead of &. Presumably because they are normally used by themes for <link rel="alternate"> tags.

I updated build_cache_path() to help catch this.

@jaswrks
Copy link

jaswrks commented Aug 10, 2014

Ugh. I just added the query string variations for a case where fancy permalinks are in use; just to cover all the bases. Only to test further and discover that WordPress redirect_canonical() tries to alter these anyway; trying to use /feed/ if at all possible, even if a site owner tries to use ?feed=. lol

There are still variations to consider though. Back to the drawing board on that one.

@jaswrks
Copy link

jaswrks commented Aug 10, 2014

WordPress redirect_canonical() completely rewrites the following into SEO-friendly versions.

  • /?feed=[feed]
  • /?feed=comments-[feed]
  • /?feed=[feed]&p=[post ID]

Author feeds are partially rewritten. Same with term-related feeds (partially rewritten).
So really, all we need to cover here are author variations and term variations.

jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 10, 2014
…endly permalinks. We now consider `redirect_canonical()` as well. See wpsharks/comet-cache#182
jaswrks pushed a commit that referenced this issue Aug 10, 2014
…endly permalinks. We now consider `redirect_canonical()` as well. See #182
jaswrks pushed a commit to wpsharks/comet-cache-pro that referenced this issue Aug 10, 2014
jaswrks pushed a commit that referenced this issue Aug 10, 2014
@jaswrks
Copy link

jaswrks commented Aug 10, 2014

@raamdev I've got the query string variations covered for authors and terms (in every way I found possible); i.e. whenever GET request caching is enabled, feed caching is enabled, and the WordPress-generated feed URLs are using an SEO-friendly format. After some initial testing, we are now also considering redirect_canonical() since it has a role to play with feed URLs too.

That covers just about all the bases. A second round of testing looks pretty good. Let me know if you spot anything that needs further attention.

@jaswrks
Copy link

jaswrks commented Aug 11, 2014

For example, on my personal site I have it setup so that I can request a feed of all posts with format "Aside" using /feed/?rss-post-format-aside=true. It may even be true that some WordPress plugins that change the behavior of feeds also use query strings (e.g., plugins that provide a "protected" feed use a query string to pass the feed "key").

I don't see it being possible to detect things like this. Query strings are not something that should ever be cached in my view. The only partially-legitimate reason for us to support the caching of GET requests, is to support the default WordPress behavior where fancy permalinks are disabled. For that matter, I think WordPress should not even have this option for site owners. I can't think of a single good reason to support query strings in permalinks. Anyway, that's another discussion.

Opinions aside, QC currently caches GET requests (i.e. if a site owner chooses to turn this on and goes against the advice presented in the Dashboard); each query string is converted to an MD5 hash via build_cache_path(). This is one-way encryption. Therefore, attempting to find cache entries later that are associated with specific query string variables is impossible at present. We can look for specific URLs that contain known query string variables in a specific order; but that's about as far as I know how to go on this topic. I'd be happy to hear your ideas though. If there is something I am missing please let me know.

I still agree that we should set aside a separate issue to reference the limitations mentioned in this issue. That way if something changes or improves in the future we can come back and take a closer look.


Regarding /feed/?rss-post-format-aside=true. The proper way to handle this in WordPress (that is, if you wanted it to be cached), is to setup a rewrite rule and use an SEO-friendly format instead of a query string. Any URL with a query string, should, IMO, be considered dynamic. Thus, it should not be cached. http://codex.wordpress.org/Rewrite_API/add_rewrite_rule

I think Quick Cache should continue to discourage the caching of GET requests for the same reason that search engines discourage them. There are too many possibilities. Even the order of query string variables can make a difference in the way a server handles a request. It will certainly make a difference in the format of the URL.


I even think it would be OK if at some point we decided to remove support for GET requests in Quick Cache entirely; i.e. if you want to cache your site, enable fancy permalinks.

@jaswrks
Copy link

jaswrks commented Aug 11, 2014

I think the best solution here would be for the feed purging routine to simply look for both SEO-friendly cache formations and query string formations when GET Request caching is enabled (that's the only scenario where both types might get cached).

This was addressed in the most recent commits. We now detect known query string formats that occur after redirect_canonical() does its thing; even when SEO-friendly permalinks are in use.

@raamdev
Copy link
Contributor

raamdev commented Aug 11, 2014

@jaswsinc Thanks for all your work on this! It sounds like it turned out to be a much better PITA than it seemed. :(

QC currently caches GET requests (i.e. if a site owner chooses to turn this on and goes against the advice presented in the Dashboard); each query string is converted to an MD5 hash via build_cache_path(). This is one-way encryption. Therefore, attempting to find cache entries later that are associated with specific query string variables is impossible at present. We can look for specific URLs that contain known query string variables in a specific order; but that's about as far as I know how to go on this topic. I'd be happy to hear your ideas though. If there is something I am missing please let me know.

So that means it's not currently possible to selectively purge cached GET Requests, correct? The only way to clear cached GET Requests is to let the cache files expire on their own or to manually clear the cache. Do I have that correct?

If that's correct, I think that's fine. I haven't heard any reports of site owners needing to selectively clear cached GET Requests, so we can worry about that if/when someone submits a feature request.

I think the best solution here would be for the feed purging routine to simply look for both SEO-friendly cache formations and query string formations when GET Request caching is enabled (that's the only scenario where both types might get cached).

This was addressed in the most recent commits. We now detect known query string formats that occur after redirect_canonical() does its thing; even when SEO-friendly permalinks are in use.

So that covers all scenarios for Feed caching where standard query strings might be used (i.e., the /?feed= query strings that redirect_cononical() rewrites) but what is still not covered is a scenario where both GET Request caching and SEO-friendly permalinks are in use and a site owner has set up custom query strings (like I have on my personal site with /feed/?rss-post-format-aside=true), is that correct?

In such a scenario, where custom feed query strings are in use and GET Request caching is enabled, and SEO-friendly permalinks are enabled, those feed URLs with custom query strings will get cached, but will not be purged via the automatic feed purging routines. Those cache files would only get purged when they expire on their own or when a site owner manually clears the cache. Is that correct?

@jaswrks
Copy link

jaswrks commented Aug 11, 2014

In such a scenario, where custom feed query strings are in use and GET Request caching is enabled, and SEO-friendly permalinks are enabled, those feed URLs with custom query strings will get cached, but will not be purged via the automatic feed purging routines. Those cache files would only get purged when they expire on their own or when a site owner manually clears the cache. Is that correct?

Yep, that's right. Feed URLs with custom query strings would not be identified as being feed URLs by Quick Cache. A manual cache clear would be needed to purge these at present.

@jaswrks
Copy link

jaswrks commented Aug 11, 2014

So that means it's not currently possible to selectively purge cached GET Requests, correct? The only way to clear cached GET Requests is to let the cache files expire on their own or to manually clear the cache. Do I have that correct?

Correct. It's possible to identify known patterns and purge cached GET requests for something like ?feed=atom&p=2. We do that just fine. However, custom query string variables (which can appear in any order) are not predictable.

@jaswrks
Copy link

jaswrks commented Aug 11, 2014

It sounds like it turned out to be a much better PITA than it seemed. :(

Yep, this was a bit of work. I learned a few things about WP feeds out of this though.

@raamdev
Copy link
Contributor

raamdev commented Aug 11, 2014

@jaswsinc So are PRs #271 and wpsharks/comet-cache-pro#73 ready to go, as far as you can tell? I'll be sure to test this thoroughly during the release candidate period.

@jaswrks
Copy link

jaswrks commented Aug 11, 2014

Yes, these are both ready to merge. I'm not aware of any remaining issues at this time.

@raamdev
Copy link
Contributor

raamdev commented Aug 12, 2014

Closed by PRs #271 and wpsharks/comet-cache-pro#73. Thanks @jaswsinc!

@raamdev raamdev closed this as completed Aug 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants