Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Option to automatically purge sitemap cache files #169

Closed
raamdev opened this Issue May 21, 2014 · 27 comments

Comments

Projects
None yet
2 participants
Owner

raamdev commented May 21, 2014

Quick Cache will cache sitemap files (e.g., /sitemap.xml) generated by sitemap generation plugins, however the sitemap cache files never get purged when a new post is created or the sitemap changes in some way.

There should be an option to automatically purge the sitemap cache files, with perhaps a way of specifying the name of the sitemap file whose cache should be purged (some plugins generate sitemap files with names other than sitemap.xml).

Reported here: http://wordpress.org/support/topic/automatically-clear-the-cache-for-sitemaps?replies=1#post-5551760

@raamdev raamdev added this to the Future Release milestone May 21, 2014

Owner

jaswrks commented Jun 9, 2014

@raamdev Would you like me to work on this?

Owner

raamdev commented Jun 9, 2014

@jaswsinc Yes, please. That would be great. Thank you!

@jaswrks jaswrks self-assigned this Jun 9, 2014

Owner

jaswrks commented Jun 9, 2014

Assigning this to me. Thanks.

Owner

jaswrks commented Jun 12, 2014

@raamdev Will this be available in both lite/pro?

Owner

raamdev commented Jun 12, 2014

@jaswsinc In LITE, this option should be Enabled and the sitemap (/sitemap.xml and maybe /sitemap/) should purged when appropriate (e.g., when a new post is published). If you want to disable this feature or specify the sitemap filename or path, you'll need Pro.

In Pro, there should be an option to disable this feature and a way of specifying the name of the sitemap file.

Do you know if it's common to have multiple sitemap files? If so, maybe Pro should be a multi-line text box where several sitemap URIs can be specified...

Also, we need to consider Multisite installs. (Related: #201)

Owner

jaswrks commented Jun 14, 2014

Do you know if it's common to have multiple sitemap files? If so, maybe Pro should be a multi-line text box where several sitemap URIs can be specified...

Yes, it is common for there to be a single sitemap index file; but then several other sitemap files for each type of content; i.e. archives, posts, pages, custom post types, etc. In practice, most of these files will begin with sitemap, so I think it would be a good idea to just purge any file that contains ^sitemap(.*?)\-xml using our recursive regex iterator.

Owner

jaswrks commented Jun 14, 2014

If you want to disable this feature or specify the sitemap filename or path, you'll need Pro.

Got it. Thanks.

Owner

jaswrks commented Jun 14, 2014

Referencing: websharks#105

Owner

jaswrks commented Jun 14, 2014

Also, we need to consider Multisite installs. (Related: #201)

Got it, thanks.

jaswrks pushed a commit to websharks/comet-cache-pro that referenced this issue Jun 14, 2014

jaswrks pushed a commit to websharks/comet-cache-pro that referenced this issue Jun 14, 2014

Owner

jaswrks commented Jun 14, 2014

NOTE: My work in this PR includes a merge of my work on #130 and #151. I felt it necessary to pull my changes from those other two feature branches before beginning work on this issue. Just to keep things sane until we have the work from those other two merged into the dev branch.

jaswrks pushed a commit that referenced this issue Jun 17, 2014

jaswrks pushed a commit to websharks/comet-cache-pro that referenced this issue Jun 17, 2014

jaswrks pushed a commit to websharks/comet-cache-pro that referenced this issue Jun 17, 2014

jaswrks pushed a commit to websharks/comet-cache-pro that referenced this issue Jun 17, 2014

jaswrks pushed a commit to websharks/comet-cache-pro that referenced this issue Jun 17, 2014

Owner

jaswrks commented Jun 17, 2014

@raamdev Child blog caching is now supported in the dev branch following my work here in #169

Work has been completed in this PR and is ready to be merged in. The only remaining issue is actually related to a bug in the Google XML Sitemaps plugin. I'm still waiting on a response from the developer of that plugin; see #105 for further details.

  • Consider multisite installs and resolve #201 also.
  • Integrate UI configuration pane into pro version.
  • Await a resolution in #105.
  • Tests against this feature.

2014-06-16_17-10-18

2014-06-16_16-50-17

Owner

raamdev commented Jun 17, 2014

@jaswsinc Awesome. Thanks! I see we're still awaiting a resolution for #105. Is this PR ready to be merged, or are we waiting on that?

Also, regarding the XML Sitemap Patterns section of the UI: Would you mind adding a sentence in there that clarifies how one would specify a pattern for a path that includes sub-directories (e.g., /blog/sitemap*xml; I assume that's correct, but I'm not entirely sure, so site-owners may also be unsure).

Is /blog/sitemap*xml matched by sitemap*xml or does one need to specify the /blog/ portion in the pattern as well?

Owner

raamdev commented Jun 17, 2014

Also, if you want to just update PR #218 with the above requested changes to the UI, that's fine with me. I'll just close #217 and merge #218.

Owner

raamdev commented Jun 17, 2014

Ah, I see now after re-reading it a few times that you specifically say the patterns should match "the cache file representation", so that means /blog/sitemap*xml would need to be used. Still, it's a bit unclear at first read and a single additional sentence would go a long way to improving those docs, IMO. :)

Owner

jaswrks commented Jun 17, 2014

Awesome. Thanks! I see we're still awaiting a resolution for #105. Is this PR ready to be merged, or are we waiting on that?

I think we can call that a separate issue at this point. It's definitely related to our work here, but Arnee wrote me back and says that he's going to fix that issue in the Google XML Sitemaps plugin soon. It's out of our hands now in my view.

Owner

jaswrks commented Jun 17, 2014

Ah, I see now after re-reading it a few times that you specifically say the patterns should match "the cache file representation", so that means /blog/sitemap*xml would need to be used. Still, it's a bit unclear at first read and a single additional sentence would go a long way to improving those docs, IMO. :)

Right, the blog/ would need to be there if it was somehow nested into a sub-directory.

I was torn about how to deal with this. Maybe you can offer a suggestion about this. The cache file representation is where I'm a little stuck. That's how it's purged obviously, but that's more difficult to understand from the UI perspective.

Now that I've thought about it a little more, I think this maybe should be redesigned to just accept a path which leads to the sitemap(s); i.e. instead of asking for the cache representation, we just ask them for the URI which leads to the sitemap(s); and allow them to use a wildcard to help cover many in one shot. What do you think?

Same concept really, but we'll convert the URI into the proper format internally.

Owner

jaswrks commented Jun 17, 2014

I think there is another open issue related to the clearing of feeds. What we end up with here might also help us implement the feed purging too. Although, feeds are more standardized within WordPress. Still, it might be nice to offer a similar UI for auto-purging feeds.

Owner

raamdev commented Jun 17, 2014

I think this maybe should be redesigned to just accept a path which leads to the sitemap(s);

So, a URL? E.g., the site owner might specify something like this:

http://example.com/sitemap*xml
http://example.com/blog/archive/sitemap*xml

If so, then yes, I agree that's probably a little more clear. The site owner will likely know where his sitemap file is located anyway, since he'll probably have submitted the URL to a search engine for indexing.

Owner

jaswrks commented Jun 17, 2014

I was thinking that URIs would be more portable, but we can definitely support full URLs too.

Owner

raamdev commented Jun 17, 2014

Either one works, as long as it's clear in the docs. :)

Owner

jaswrks commented Jun 18, 2014

I'm just looking at this again. I think URIs are better here. In a multisite environment we want the http://domain/path to get swapped out dynamically depending on which blog we're dealing with.

Owner

raamdev commented Jun 18, 2014

Hmmm. Yeah, that makes sense. Good catch!

@raamdev raamdev added the enhancement label Jun 18, 2014

jaswrks pushed a commit to websharks/comet-cache-pro that referenced this issue Jun 18, 2014

Owner

jaswrks commented Jun 18, 2014

Great, I think this is good now. Seems to be working properly in my tests. Here is an updated shot of the UI. The additional notes below the field regarding multisite are displayed only in multisite networks.

2014-06-17_18-26-17

Owner

raamdev commented Jun 18, 2014

Perfect! Thank you.

Owner

raamdev commented Jun 18, 2014

Closed by #217.

@raamdev raamdev closed this Jun 18, 2014

Owner

jaswrks commented Jun 21, 2014

Reopening this issue after @raamdev found a bug.

A bot/crawler will look for a sitemap to live under the web address of home_url() when it differs from site_url(). We should change calls to site_url() over to home_url() and/or network_home_url() in both the menu page and also in the auto_cache class.

@jaswrks jaswrks reopened this Jun 21, 2014

jaswrks pushed a commit to websharks/comet-cache-pro that referenced this issue Jun 21, 2014

Owner

raamdev commented Jun 21, 2014

@raamdev raamdev closed this Jun 21, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment