-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
🐛 Fixed paginated sitemaps returning 404 for large sites #25796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fixes https://linear.app/ghost/issue/BER-3152 The static-theme middleware had a hardcoded list of sitemap URLs that should fall through to Ghost's sitemap handler. This list only included the first page of each sitemap type (e.g., /sitemap-posts.xml) but not paginated sitemaps (e.g., /sitemap-posts-2.xml). For sites with more than 50k posts, the sitemap index correctly links to paginated files, but requests for those files were being blocked by the static-theme middleware because they weren't in the fallthrough list. The fix replaces the hardcoded list with a regex pattern that matches all paginated sitemap URLs (sitemap-{type}-{page}.xml).
WalkthroughThe changes introduce a new helper function Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
ghost/core/test/unit/frontend/web/middleware/static-theme.test.js (1)
429-488: Consider adding tests for remaining sitemap types.While the current tests provide good coverage for the main use cases, consider adding tests for:
- The
pagestype: The regex includespages, but there's no test for/sitemap-pages-2.xml- The
userstype: Ifusersis a valid sitemap type, add a test for/sitemap-users-2.xml- The
/sitemap.xslfile: This is in the hardcoded fallthrough list but lacks a dedicated test- Negative test case: Verify that invalid patterns like
/sitemap-invalid-2.xmlcorrectly receivefallthrough: false📝 Example test cases for additional coverage
Add these tests to the "fallthrough behavior" describe block:
it('should set fallthrough to true for /sitemap.xsl', function (done) { req.path = '/sitemap.xsl'; staticTheme()(req, res, function next() { activeThemeStub.calledTwice.should.be.true(); expressStaticStub.called.should.be.true(); const options = expressStaticStub.firstCall.args[1]; options.should.have.property('fallthrough', true); done(); }); }); it('should set fallthrough to false for invalid sitemap types like /sitemap-invalid-2.xml', function (done) { req.path = '/sitemap-invalid-2.xml'; staticTheme()(req, res, function next() { activeThemeStub.calledTwice.should.be.true(); expressStaticStub.called.should.be.true(); const options = expressStaticStub.firstCall.args[1]; options.should.have.property('fallthrough', false); done(); }); });
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ghost/core/core/frontend/web/middleware/static-theme.jsghost/core/test/unit/frontend/web/middleware/static-theme.test.js
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T15:44:52.549Z
Learnt from: 9larsons
Repo: TryGhost/Ghost PR: 21866
File: ghost/core/core/server/web/admin/middleware/serve-auth-frame-file.js:10-19
Timestamp: 2025-04-23T15:44:52.549Z
Learning: The existing implementation in `ghost/core/core/server/web/admin/middleware/serve-auth-frame-file.js` using `path.parse(req.url).base` is secure against path traversal attacks as it properly extracts only the filename component without any directory parts.
Applied to files:
ghost/core/core/frontend/web/middleware/static-theme.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Acceptance tests (Node 22.18.0, mysql8)
- GitHub Check: Legacy tests (Node 22.18.0, mysql8)
- GitHub Check: Legacy tests (Node 22.18.0, sqlite3)
- GitHub Check: Ghost-CLI tests
- GitHub Check: Lint
- GitHub Check: Unit tests (Node 22.18.0)
- GitHub Check: Acceptance tests (Node 22.18.0, sqlite3)
- GitHub Check: Cursor Bugbot
- GitHub Check: Build & Push Docker Image
🔇 Additional comments (3)
ghost/core/core/frontend/web/middleware/static-theme.js (2)
58-82: Well-structured helper function.The
isFallthroughFilefunction is well-organized with clear comments and a logical flow. The separation of hardcoded fallthrough files and pattern matching makes the code maintainable.
77-77: The regex is correct as written.The pattern
/^\/sitemap-(posts|pages|tags|authors|users)(-\d+)?\.xml$/correctly matches all sitemap types Ghost generates. Theusersproperty is set on the manager (as an alias toauthors) in site-map-manager.js, and theverifyResourceTypemiddleware validates against existing manager properties. Pagination constraints are already enforced in the handler (rejecting page 1 and null content).Likely an incorrect or invalid review comment.
ghost/core/test/unit/frontend/web/middleware/static-theme.test.js (1)
429-488: Good test coverage for paginated sitemaps.The new tests effectively verify that paginated sitemap URLs correctly receive
fallthrough: true. The tests cover multiple resource types (posts, tags, authors) and different page numbers (2, 99), confirming the regex pattern works as intended.
kevinansfield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of duplication in the test file that could be simplified but the fix itself looks fine 👌
fixes https://linear.app/ghost/issue/BER-3152
Summary
/sitemap-posts-2.xml) returned 404 for sites with more than 50k postsProblem
Sites with more than 50,000 posts get paginated sitemaps. The sitemap index (
/sitemap.xml) correctly contains links to paginated files like/sitemap-posts-2.xml. However, when browsers or crawlers request these paginated URLs, they receive a 404 error.Root Cause: The
static-theme.jsmiddleware had a hardcoded list of fallthrough files that only included the first page of each sitemap type (e.g.,/sitemap-posts.xml). Paginated URLs like/sitemap-posts-2.xmlwere not in this list, so the middleware tried to serve them from the theme folder, failed, and returned 404 instead of passing the request to Ghost's sitemap handler.Solution
Replaced the hardcoded sitemap URL list with a regex pattern that matches all paginated sitemap URLs:
/^\/sitemap-(posts|pages|tags|authors|users)(-\d+)?\.xml$/Test plan
Note
Improves sitemap handling in the static theme middleware to ensure Ghost-generated sitemaps are served correctly.
isFallthroughFileto centralize fallthrough logic forrobots.txt,sitemap.xml,sitemap.xsl, andsitemap-{posts|pages|tags|authors|users}(-page).xmlforwardToExpressStaticto useisFallthroughFileinstead of a hardcoded listsitemap-posts-2.xml,sitemap-posts-99.xml,sitemap-tags-3.xml,sitemap-authors-2.xml) and validates fallthrough remains false for other static filesWritten by Cursor Bugbot for commit 266777e. This will update automatically on new commits. Configure here.