Skip to content

🎨 Improved admin Comments page load time on large comment datasets#28734

Merged
9larsons merged 3 commits into
TryGhost:mainfrom
jwmarshall:comment-indexes
Jun 26, 2026
Merged

🎨 Improved admin Comments page load time on large comment datasets#28734
9larsons merged 3 commits into
TryGhost:mainfrom
jwmarshall:comment-indexes

Conversation

@jwmarshall

@jwmarshall jwmarshall commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Hello Ghost team 👋 - This is my first PR and I've tried to follow all repository guidelines. For context, I'm working on a migration from Wordpress that has almost 400k comments. Once imported I noticed performance issues for some pages and queries for comments. This PR attempts to add some tables indexes to improve performance.

Thanks in advance!

--

On sites with a large number of comments, the admin Comments moderation page ("all comments") becomes effectively unusable. It took ~90s to load on a real-world dataset of ~390k comments. The public per-post comment widget is unaffected; this is purely the admin getAdminAllComments path.

The page query orders by created_at and emits the count.replies / count.direct_replies relations as per-row correlated subqueries. With no supporting indexes, the optimizer full-scans the comments table once per returned row and filesorts the whole table for the ordering. The problem is worst when the data is skewed toward top-level comments (parent_id IS NULL), where the column is too low-cardinality for the optimizer to trust the existing single-column FK index, but it slows down any large comment table.

What does it do?

Adds four additive secondary indexes to the comments table, declared in schema.js and applied via a non-transactional migration:

Index Serves
comments(created_at) the ORDER BY created_at DESC list (removes the filesort)
comments(status) the COUNT(DISTINCT id) pagination count
comments(in_reply_to_id, status) the count.direct_replies subquery on in_reply_to_id
comments(parent_id, in_reply_to_id, status) count.replies plus the parent_id + in_reply_to_id IS NULL half of count.direct_replies

The 3-column index covers both the parent_id-only and parent_id + in_reply_to_id IS NULL subqueries, so a separate (parent_id, status) is not needed. On the test dataset this cut the page's DB time from ~90s to tens of milliseconds.

The indexes are purely additive — parent_id and in_reply_to_id keep their own foreign-key indexes — so the migration's down drops them without any FK index re-add dance.

Why is this something Ghost users or developers need?

Comment moderation on any high-volume Ghost site is currently slow to the point of timing out. This is a low-risk, backward-compatible fix (no schema/data. changes beyond indexes, no API or behaviour changes) that makes the moderation page usable at scale.

Notes / trade-offs

  • Write amplification: four extra secondary indexes add modest per-row cost on comment insert/update/delete; negligible for read-heavy comment workloads.
  • comments(status) is low-cardinality: it still beats a clustered full scan for the COUNT(DISTINCT) because the secondary index is far narrower than the row (which carries the html longtext). It's the most droppable of the four.
  • Alternative considered: a query-level refactor that batches the reply counts into a single grouped pass (instead of per-row correlated subqueries) would remove the need for the reply-count indexes entirely. Indexes were chosen here as the minimal, backward-compatible change; happy to follow up with the query refactor if preferred.

Testing

  • Updated the schema integrity hash test (integrity.test.js).
  • Verified the migration up creates all four indexes, is idempotent, and down reverses cleanly.
  • Existing comments-service unit tests pass; lint passes.

  • I've read and followed the Contributor Guide
  • I've explained my change
  • I've written an automated test to prove my change works

A note on that last checkbox: this is a pure index/migration change, covered by the schema integrity test and manual migration verification, but there's no automated test proving the performance win (that needs a large seeded dataset and EXPLAIN ANALYZE, which isn't practical as a unit test). I left it unchecked to be honest.

@github-actions github-actions Bot added the migration [pull request] Includes migration for review label Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

It looks like this PR contains a migration 👀
Here's the checklist for reviewing migrations:

General requirements

  • ⚠️ Tested performance on staging database servers, as performance on local machines is not comparable to a production environment
  • Satisfies idempotency requirement (both up() and down())
  • Does not reference models
  • Filename is in the correct format (and correctly ordered)
  • Targets the next minor version
  • All code paths have appropriate log messages
  • Uses the correct utils
  • Contains a minimal changeset
  • Does not mix DDL/DML operations
  • Tested in MySQL and SQLite

Schema changes

  • Both schema change and related migration have been implemented
  • For index changes: has been performance tested for large tables
  • For new tables/columns: fields use the appropriate predefined field lengths
  • For new tables/columns: field names follow the appropriate conventions
  • Does not drop a non-alpha table outside of a major version

Data changes

  • Mass updates/inserts are batched appropriately
  • Does not loop over large tables/datasets
  • Defends against missing or invalid data
  • For settings updates: follows the appropriate guidelines

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a9a95de9-fb4f-43ec-a900-c13123b85a19

📥 Commits

Reviewing files that changed from the base of the PR and between b28b3a9 and 16661eb.

📒 Files selected for processing (6)
  • ghost/admin/package.json
  • ghost/core/core/server/data/migrations/versions/6.50/2026-06-18-18-02-46-add-comments-moderation-indexes.js
  • ghost/core/core/server/data/schema/schema.js
  • ghost/core/core/server/models/comment.js
  • ghost/core/package.json
  • ghost/core/test/unit/server/data/schema/integrity.test.js
✅ Files skipped from review due to trivial changes (1)
  • ghost/admin/package.json
🚧 Files skipped from review as they are similar to previous changes (3)
  • ghost/core/test/unit/server/data/schema/integrity.test.js
  • ghost/core/core/server/data/schema/schema.js
  • ghost/core/core/server/models/comment.js

Walkthrough

The comments table schema index list was expanded with indexes on created_at, status, in_reply_to_id plus status, and parent_id plus in_reply_to_id plus status. A new migration adds and removes those indexes. The comment replies relation now sorts by created_at and then id. The schema integrity test expected hash was updated, and the ghost-admin and ghost-core package versions were bumped to 6.50.0-rc.0.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: faster admin Comments page loading on large datasets.
Description check ✅ Passed The description is directly related to the index and migration changes in the pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@jwmarshall

Copy link
Copy Markdown
Contributor Author

I have also created a PR that contains all four new indexes combined with a query refactor under my fork: jwmarshall#1

I'm working out a test environment to show the difference in performance gains.

@9larsons 9larsons force-pushed the comment-indexes branch 2 times, most recently from 69aa198 to fdb14ea Compare June 25, 2026 17:48
@nx-cloud

nx-cloud Bot commented Jun 25, 2026

Copy link
Copy Markdown

🤖 Nx Cloud AI Fix

Ensure the fix-ci command is configured to always run in your CI pipeline to get automatic fixes in future runs. For more information, please see https://nx.dev/ci/features/self-healing-ci


View your CI Pipeline Execution ↗ for commit b28b3a9

Command Status Duration Result
nx run ghost:test:ci:integration ✅ Succeeded 2m 30s View ↗
nx run ghost:test:integration ✅ Succeeded 2m 27s View ↗
nx run ghost:test:legacy ✅ Succeeded 2m 57s View ↗
nx run ghost-admin:test ✅ Succeeded 2m 51s View ↗
nx run ghost:test:e2e ✅ Succeeded 2m 15s View ↗
nx run @tryghost/admin:build ✅ Succeeded 1m 56s View ↗
nx run-many --target=build --projects=tag:publi... ✅ Succeeded 2s View ↗
nx run-many -t lint -p ghost-admin,ghost ✅ Succeeded 39s View ↗
Additional runs (3) ✅ Succeeded ... View ↗

💡 Verify your cache is correct by running tasks in a sandbox. Read docs ↗


☁️ Nx Cloud last updated this comment at 2026-06-26 17:15:43 UTC

@9larsons 9larsons force-pushed the comment-indexes branch 2 times, most recently from 9e3c79f to b28b3a9 Compare June 26, 2026 16:55
jwmarshall and others added 3 commits June 26, 2026 12:00
no ref

- The admin "all comments" moderation page (getAdminAllComments) orders by
  created_at and emits the count.replies / count.direct_replies relations as
  per-row correlated subqueries. With no supporting indexes the optimizer
  full-scans the comments table once per returned row and filesorts the whole
  table for the ordering, so the page took ~90s to load on sites with very
  large numbers of comments (especially ones skewed toward top-level comments).
- Added four additive secondary indexes on comments — created_at, status,
  (in_reply_to_id, status) and (parent_id, in_reply_to_id, status) — covering
  the ORDER BY list, the COUNT(DISTINCT) pagination count, and the reply-count
  subqueries, cutting the page's DB time from ~90s to tens of milliseconds.
- Chose indexes over a query refactor to keep the change minimal and
  backward-compatible; they are purely additive, so the migration's down drops
  them without disturbing the existing foreign-key indexes.
- InnoDB silently consolidates the auto-created single-column FK indexes
  on parent_id and in_reply_to_id into the new composite indexes once
  they exist (because the composites have those columns as leading
  prefix), so dropping the composites in down() failed with
  ERROR 1553 'needed in a foreign key constraint'
- down() now re-adds the single-column FK indexes before dropping the
  composites, restoring the pre-migration shape cleanly
- comment block corrected to describe what each index actually does
  (status is for status-filtered pagination COUNTs, not the unfiltered
  one which picks created_at) and to call out the InnoDB consolidation
- the replies hasMany relation ordered by created_at ASC only; replies
  inserted in a tight loop (admin Comments e2e tests) share a millisecond
  on created_at, so the tie-break was implementation-defined
- the new created_at index in the comments-moderation migration exposed
  this by giving the planner a covering index for the sort, which flipped
  two reply orderings against the test's implicit assumption about
  insertion order
- pinned ordering to id ASC as a deterministic tie-break, restoring the
  insertion-order behavior the tests and any client paginating by id rely
  on
@9larsons

Copy link
Copy Markdown
Contributor

Got tripped up on a couple minor version releases + flaky acceptance tests.

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.37%. Comparing base (b23b781) to head (16661eb).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #28734   +/-   ##
=======================================
  Coverage   74.37%   74.37%           
=======================================
  Files        1559     1559           
  Lines      135002   135006    +4     
  Branches    16406    16414    +8     
=======================================
+ Hits       100402   100414   +12     
- Misses      33573    33596   +23     
+ Partials     1027      996   -31     
Flag Coverage Δ
admin-tests 55.21% <ø> (ø)
e2e-tests 76.53% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@9larsons 9larsons merged commit 0ed0d9f into TryGhost:main Jun 26, 2026
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

migration [pull request] Includes migration for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants