Skip to content

Conversation

@JohnONolan
Copy link
Member

@JohnONolan JohnONolan commented Jan 7, 2026

fixes https://linear.app/ghost/issue/BER-3152

Summary

  • Fixed issue where paginated sitemaps (e.g., /sitemap-posts-2.xml) returned 404 for sites with more than 50k posts
  • The static-theme middleware now correctly allows all paginated sitemap URLs to fall through to Ghost's sitemap handler

Problem

Sites with more than 50,000 posts get paginated sitemaps. The sitemap index (/sitemap.xml) correctly contains links to paginated files like /sitemap-posts-2.xml. However, when browsers or crawlers request these paginated URLs, they receive a 404 error.

Root Cause: The static-theme.js middleware had a hardcoded list of fallthrough files that only included the first page of each sitemap type (e.g., /sitemap-posts.xml). Paginated URLs like /sitemap-posts-2.xml were not in this list, so the middleware tried to serve them from the theme folder, failed, and returned 404 instead of passing the request to Ghost's sitemap handler.

Solution

Replaced the hardcoded sitemap URL list with a regex pattern that matches all paginated sitemap URLs: /^\/sitemap-(posts|pages|tags|authors|users)(-\d+)?\.xml$/

Test plan

  • Added unit tests for paginated sitema...[Response truncated]

Note

Improves sitemap handling in the static theme middleware to ensure Ghost-generated sitemaps are served correctly.

  • Introduces isFallthroughFile to centralize fallthrough logic for robots.txt, sitemap.xml, sitemap.xsl, and sitemap-{posts|pages|tags|authors|users}(-page).xml
  • Updates forwardToExpressStatic to use isFallthroughFile instead of a hardcoded list
  • Adds unit tests covering paginated sitemap variants (e.g., sitemap-posts-2.xml, sitemap-posts-99.xml, sitemap-tags-3.xml, sitemap-authors-2.xml) and validates fallthrough remains false for other static files

Written by Cursor Bugbot for commit 266777e. This will update automatically on new commits. Configure here.

fixes https://linear.app/ghost/issue/BER-3152

The static-theme middleware had a hardcoded list of sitemap URLs that
should fall through to Ghost's sitemap handler. This list only included
the first page of each sitemap type (e.g., /sitemap-posts.xml) but not
paginated sitemaps (e.g., /sitemap-posts-2.xml).

For sites with more than 50k posts, the sitemap index correctly links to
paginated files, but requests for those files were being blocked by the
static-theme middleware because they weren't in the fallthrough list.

The fix replaces the hardcoded list with a regex pattern that matches
all paginated sitemap URLs (sitemap-{type}-{page}.xml).
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

Walkthrough

The changes introduce a new helper function isFallthroughFile to the static-theme middleware that centralizes logic for determining which request paths should fall through to the next middleware. This replaces previous hardcoded path checks. The function now supports paginated sitemap patterns (sitemap-{type}(-{page}).xml) in addition to robots.txt, sitemap.xml, and sitemap.xsl. Corresponding tests are added to verify fallthrough behavior for the new paginated sitemap paths while maintaining coverage for existing paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main fix: handling paginated sitemaps that were returning 404 errors on large sites.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about the problem, root cause, and solution for the sitemap handling issue.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ghost/core/test/unit/frontend/web/middleware/static-theme.test.js (1)

429-488: Consider adding tests for remaining sitemap types.

While the current tests provide good coverage for the main use cases, consider adding tests for:

  1. The pages type: The regex includes pages, but there's no test for /sitemap-pages-2.xml
  2. The users type: If users is a valid sitemap type, add a test for /sitemap-users-2.xml
  3. The /sitemap.xsl file: This is in the hardcoded fallthrough list but lacks a dedicated test
  4. Negative test case: Verify that invalid patterns like /sitemap-invalid-2.xml correctly receive fallthrough: false
📝 Example test cases for additional coverage

Add these tests to the "fallthrough behavior" describe block:

it('should set fallthrough to true for /sitemap.xsl', function (done) {
    req.path = '/sitemap.xsl';

    staticTheme()(req, res, function next() {
        activeThemeStub.calledTwice.should.be.true();
        expressStaticStub.called.should.be.true();

        const options = expressStaticStub.firstCall.args[1];
        options.should.have.property('fallthrough', true);

        done();
    });
});

it('should set fallthrough to false for invalid sitemap types like /sitemap-invalid-2.xml', function (done) {
    req.path = '/sitemap-invalid-2.xml';

    staticTheme()(req, res, function next() {
        activeThemeStub.calledTwice.should.be.true();
        expressStaticStub.called.should.be.true();

        const options = expressStaticStub.firstCall.args[1];
        options.should.have.property('fallthrough', false);

        done();
    });
});
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d1b9a8 and 266777e.

📒 Files selected for processing (2)
  • ghost/core/core/frontend/web/middleware/static-theme.js
  • ghost/core/test/unit/frontend/web/middleware/static-theme.test.js
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T15:44:52.549Z
Learnt from: 9larsons
Repo: TryGhost/Ghost PR: 21866
File: ghost/core/core/server/web/admin/middleware/serve-auth-frame-file.js:10-19
Timestamp: 2025-04-23T15:44:52.549Z
Learning: The existing implementation in `ghost/core/core/server/web/admin/middleware/serve-auth-frame-file.js` using `path.parse(req.url).base` is secure against path traversal attacks as it properly extracts only the filename component without any directory parts.

Applied to files:

  • ghost/core/core/frontend/web/middleware/static-theme.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Acceptance tests (Node 22.18.0, mysql8)
  • GitHub Check: Legacy tests (Node 22.18.0, mysql8)
  • GitHub Check: Legacy tests (Node 22.18.0, sqlite3)
  • GitHub Check: Ghost-CLI tests
  • GitHub Check: Lint
  • GitHub Check: Unit tests (Node 22.18.0)
  • GitHub Check: Acceptance tests (Node 22.18.0, sqlite3)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: Build & Push Docker Image
🔇 Additional comments (3)
ghost/core/core/frontend/web/middleware/static-theme.js (2)

58-82: Well-structured helper function.

The isFallthroughFile function is well-organized with clear comments and a logical flow. The separation of hardcoded fallthrough files and pattern matching makes the code maintainable.


77-77: The regex is correct as written.

The pattern /^\/sitemap-(posts|pages|tags|authors|users)(-\d+)?\.xml$/ correctly matches all sitemap types Ghost generates. The users property is set on the manager (as an alias to authors) in site-map-manager.js, and the verifyResourceType middleware validates against existing manager properties. Pagination constraints are already enforced in the handler (rejecting page 1 and null content).

Likely an incorrect or invalid review comment.

ghost/core/test/unit/frontend/web/middleware/static-theme.test.js (1)

429-488: Good test coverage for paginated sitemaps.

The new tests effectively verify that paginated sitemap URLs correctly receive fallthrough: true. The tests cover multiple resource types (posts, tags, authors) and different page numbers (2, 99), confirming the regex pattern works as intended.

Copy link
Member

@kevinansfield kevinansfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of duplication in the test file that could be simplified but the fix itself looks fine 👌

@kevinansfield kevinansfield merged commit 45d398c into main Jan 20, 2026
39 checks passed
@kevinansfield kevinansfield deleted the claude/slack-fix-sitemap-YY389 branch January 20, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants