-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACE URL List Processing Issues #480
Comments
The delay is between requests to cache a URL. A 5 second (5000ms) delay means the Auto-Cache Engine will cache a URL, then wait 5 seconds, then cache another URL, etc. That means you could cache (at most) 12 URLs per minute with a 5000ms delay, likely far less considering the added time it takes to cache each request (if some pages are taking 3-4 seconds to cache, you might be getting 4 or 5 URLs cached each minute with a 5000ms delay).
Looking at the code, I see that there's a line that does
Those are likely a result of other users accessing the page or SE crawlers.
Yes, we have a GitHub Issue open for this here: #292
No, at most the Auto-Cache Engine would be able to cache 480 URLs in 40 minutes (40*60/5), but that's not even taking into account the amount of time it takes to load each URL. Around 100 URLs in 40 minutes sounds right to me if you have a 5000ms delay. If you want to cache URLs faster, you need to lower the delay, which will require more system resources.
Hmm, that shouldn't be happening. The Auto-Cache Engine essentially just "visits" each URL, as if it were a normal visitor visiting the URL, which results in ZenCache caching the page (if necessary) as it would if a normal visitor visited an uncached page. So if How are you implementing
The Auto-Cache Engine should run and cache the URLs listed in the Other URLs box, regardless of whether or not there's a Sitemap URL. |
Correct, it is randomized. I believe there is an issue open about using Sitemap priority (#443), but for now, URLs are randomized in order to prevent a top-to-bottom approach in the stateless crawl process. Consider a site with 5K, 10K, or 100K pages. The crawler runs for only a few minutes at a time (based on configuration). Since there is no state-tracking in the current release, going from top-to-bottom (or in any specific order) could result in some pages never being cached, as the crawler would always start from the top. Even if those pages at the top have already been cached (which we detect), it still takes time to check each of them in a specific order. Thus, we randomize the order to avoid this. A solution (it could be a part of the work in #443) would be to add state-tracking; i.e., for the ACE to record where it left off. Ideally, this could be coupled with priorities being read from the Sitemap also. |
I understand randomizing for the xml sitemap urls. That could make sense. However, in the case of the other urls list, I think most site owners would have some method and reason for creating that list in the first place, and would, like I did, assume that list would a) take priority over xml sitemap files, and b) be processed in the order listed. I can see where some urls might be in both lists, in which case the site owner wants those urls indexed first, possibly due to very high access rates for those pages, In my case, my site is currently over 7,000 pages, so for several reasons, including: 1) very low access rates and already acceptable/quick load times for some pages, so they don't need to be pre-cached, and 2) Very high access rates (home page and some others), and/or get SE crawled daily so I need to get them indexed quickly after a purge, so Google and other SE crawlers always see fast page load times, which is good for SEO. My initial thought on sequential processing of the other urls list would be to load them to a unique dedicated MySQL table that is dropped and re-created whenever ZC options are saved. That table would then be accessed in index (row creation) order by ACE as it starts each pre-cache cycle, picking up wherever it left off last time. Unlike a XML sitemap, which is or could be very dynamic as posts are created, the other urls list should be relatively stable, and only updated manually by the site owner when critical pages are created that need to be added to that list. |
Re: How are you implementing DONOTCACHEPAGE? Is that a question for me or ZC developer? I just added a line: After 24 hours, the 1500 urls in my other urls list were still not all pre-cached. I just reduced the http delay to 2500ms, which I hope should get all of them indexed within 24 hrs of a full cache purge. I only do that when I make significant plugin programming changes and/or theme changes, which I have been doing on quite a few sites lately to get them to be fully Mobile friendly. |
I agree. You already opened a feature request for this (#481) so we can move further discussion on that topic to that GitHub Issue.
If you are adding that line inside a theme file that generates the pages, you should be fine. Some non-developers can use I'm not aware of any issue related to the Auto-Cache Engine and |
Using ZC Pro: Version 150409.
On my very large site, I have cleared/blanked the ZenCache Pro ACE parameter for a XML Sitemap URL because I don't want ACE to automatically process all of the urls in the sitemap. I only want it to auto-cache specific files in the urls list, which contains about 1500 urls, formatted like:
/
/user/
/user/copyright-notice/
/user/privacy-policy/
/user/sitemap/
... etc.
Please confirm that this is the proper format, and that http://www.domainname.com is not also required for each url. If it is required, then that should be in the parameter instructions.
My assumption with this approach is that ACE should auto-cache only the files I specified, and in the order I specified them, which implements a priority scheme by which urls I want auto-cached right away are at the top of the list, then other lower priority urls following. And if a user accesses a url not on the ACE urls list, it will be cached as usual, but not by ACE.
I also have a 5000ms http request delay configured, which I have always assumed is one request per page url, and not a delay between requests for each resource (js, css, etc) that might be used by a page url. Some of the pages in my urls list can take 3-4 seconds for initial render and cache. This delay should equate to about 20 urls per minute, and 300 urls in a 15 minute ACE cycle.
However, what I am seeing is this:
The text was updated successfully, but these errors were encountered: