Partition into multiple sitemap XML files for performance #9

Closed
sminnee opened this Issue Mar 31, 2012 · 1 comment

Projects

None yet

3 participants

@sminnee
sminnee commented Mar 31, 2012

From http://open.silverstripe.org/ticket/5061

At the moment, a website with the googlesitemaps module will recreate the file whenever any of its contained pages has been changed or a new one has been created. This becomes a problem with websites with several thousand pages.

Partitioning can relieve this into multiple files, some of which might never need to be regenerated. A good partitioning scheme seems to be by year based on Created date.

See a reference implementation from Hamish: http://trac.silverstripe.com/silverstripe/browser/projects/peoplebeforeprofit/trunk/mysite/code/Sitemap.php http://trac.silverstripe.com/silverstripe/browser/projects/peoplebeforeprofit/trunk/mysite/_config.php#L66

Its specific to Google news entries, but the year partitioning and routing could be easily reused.

From Google's recommendations on http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156184

A Sitemap file can contain no more than 50,000 URLs and be no larger than 10MB when uncompressed. If your Sitemap is larger than this, break it into several smaller Sitemaps. These limits help ensure that your web server is not overloaded by serving large files to Google.

Also for reference - http://www.sitemaps.org/protocol.php#index

@dhensby
Contributor
dhensby commented Jul 27, 2012

+1 for this.

Not only would it be good for performance, my sites running with 64MB RAM can't really produce sitemaps with over 1.5k entries before running out of memory. The limit of 50k / 10MiB needs to be adhered to anyway, but making it more performant from a memory point of view would be great.

@wilr wilr added a commit that closed this issue Jan 15, 2013
@wilr API Implement sitemap.xml partitioning (Fixes #9)
Misc upgrade of module code so that site map.xml provides a index site map file based on the standards. Moved configuration vars to the Config API.

Considering how large a change this is, I've branched a 1.0 release off in github.
8bbc14e
@wilr wilr closed this in 8bbc14e Jan 15, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment