Skip to content

Conversation

@9larsons
Copy link
Contributor

@9larsons 9larsons commented Nov 24, 2025

ref fcd4034

  • Fixed final aggregation to ensure all unique emailIds and memberIds are included in the results
  • Fixed delta calculations to get more accurate final counts in the logs

ref fcd4034
- Introduced cumulative event counts to track overall metrics across batches.
- Added unique emailIds and memberIds tracking to prevent re-aggregation.
- Implemented delta calculations for batch processing to improve final reporting accuracy.
- Updated final aggregation to ensure all unique emailIds and memberIds are included in the results.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Walkthrough

Refactors EmailAnalyticsService.fetchEvents to compute and return cumulative event processing across multiple batches. The change introduces pre-batch snapshots (beforeCounts, beforeEmailIds, beforeMemberIds), per-batch delta computation (batchDelta), accumulation into cumulativeResult, mid-stream consolidation that clears processed IDs during periodic aggregations, and a final aggregation pass that combines remaining in-flight IDs with accumulated ID sets. The public return (EmailAnalyticsFetchResult.result) now contains cumulativeResult instead of the in-flight processingResult.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Specific areas requiring extra attention:
    • Correctness of pre-batch snapshots and batchDelta computation (counts and ID set diffs)
    • Merging logic into cumulativeResult and deduplication of email/member ID sets
    • Mid-stream cleanup: timing and effects of clearing processed IDs and resetting processingResult
    • Final aggregation pass construction and guard conditions to ensure it runs only when needed
    • Public API semantics: callers expecting in-flight processingResult vs. cumulativeResult and type alignment

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fixed email analytics aggregation' directly describes the main change—fixing aggregation logic in email analytics to properly accumulate counts across batches.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The pull request description is directly related to the changeset, describing specific fixes to email analytics aggregation including final aggregation improvements and delta calculations.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-email-aggregation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js (1)

348-381: Per-batch snapshot/delta is correct but may be unnecessarily heavy

The snapshot/delta logic is functionally sound (captures per-batch counts and newly seen IDs, then folds them into cumulativeResult and the tracking sets), but it does a full copy of processingResult.emailIds/memberIds on every batch. With large/long-running jobs this can become O(n²) in the number of processed events.

You can get the same behavior more cheaply by letting processEventBatch build a per-batch EventProcessingResult and then merging that into both processingResult and cumulativeResult, avoiding the “before/after” diff entirely:

-        const processBatch = async (events) => {
-            // Even if the fetching is interrupted because of an error, we still store the last event timestamp
-            const processingStart = Date.now();
-            // Capture the state before processing to calculate delta
-            const beforeCounts = {
-                opened: processingResult.opened,
-                delivered: processingResult.delivered,
-                temporaryFailed: processingResult.temporaryFailed,
-                permanentFailed: processingResult.permanentFailed,
-                unsubscribed: processingResult.unsubscribed,
-                complained: processingResult.complained,
-                unhandled: processingResult.unhandled,
-                unprocessable: processingResult.unprocessable
-            };
-            const beforeEmailIds = new Set(processingResult.emailIds);
-            const beforeMemberIds = new Set(processingResult.memberIds);
-
-            await this.processEventBatch(events, processingResult, fetchData);
+        const processBatch = async (events) => {
+            // Even if the fetching is interrupted because of an error, we still store the last event timestamp
+            const processingStart = Date.now();
+            // Build a per-batch result and then merge it into the running totals
+            const batchResult = new EventProcessingResult();
+
+            await this.processEventBatch(events, batchResult, fetchData);
+
+            // processingResult keeps the rolling window used for intermediate aggregation
+            processingResult.merge(batchResult);
             processingTimeMs += (Date.now() - processingStart);
             eventCount += events.length;
 
-            // Calculate delta (only new counts from this batch) and accumulate for final reporting
-            const batchDelta = new EventProcessingResult({
-                opened: processingResult.opened - beforeCounts.opened,
-                delivered: processingResult.delivered - beforeCounts.delivered,
-                temporaryFailed: processingResult.temporaryFailed - beforeCounts.temporaryFailed,
-                permanentFailed: processingResult.permanentFailed - beforeCounts.permanentFailed,
-                unsubscribed: processingResult.unsubscribed - beforeCounts.unsubscribed,
-                complained: processingResult.complained - beforeCounts.complained,
-                unhandled: processingResult.unhandled - beforeCounts.unhandled,
-                unprocessable: processingResult.unprocessable - beforeCounts.unprocessable,
-                emailIds: processingResult.emailIds.filter(id => !beforeEmailIds.has(id)),
-                memberIds: processingResult.memberIds.filter(id => !beforeMemberIds.has(id))
-            });
-            cumulativeResult.merge(batchDelta);
-            batchDelta.emailIds.forEach(id => allEmailIds.add(id));
-            batchDelta.memberIds.forEach(id => allMemberIds.add(id));
+            // Accumulate per-batch deltas into the overall result and tracking sets
+            cumulativeResult.merge(batchResult);
+            batchResult.emailIds.forEach(id => allEmailIds.add(id));
+            batchResult.memberIds.forEach(id => allMemberIds.add(id));

This keeps processingResult semantics unchanged while avoiding repeated cloning of large ID arrays.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43ed23f and 9009dae.

📒 Files selected for processing (1)
  • ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js (4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: troyciesco
Repo: TryGhost/Ghost PR: 25288
File: ghost/core/core/server/services/member-welcome-emails/jobs/lib/process-entries.js:46-64
Timestamp: 2025-11-10T23:10:17.470Z
Learning: In ghost/core/core/server/services/member-welcome-emails/jobs/lib/process-entries.js and the outbox processing flow, entries are marked as PROCESSING before being processed. If a failure occurs after email send but before deletion, the entry remains stuck in PROCESSING state (not reprocessed). This intentional design prevents duplicate emails. Handling stuck PROCESSING entries is planned for a separate PR.
📚 Learning: 2025-10-30T17:13:26.190Z
Learnt from: sam-lord
Repo: TryGhost/Ghost PR: 25303
File: ghost/core/core/server/services/email-service/BatchSendingService.js:19-19
Timestamp: 2025-10-30T17:13:26.190Z
Learning: In ghost/core/core/server/services/email-service/BatchSendingService.js and similar files in the Ghost codebase, prefer using `{...options}` spread syntax without explicit guards like `...(options || {})` when spreading potentially undefined objects, as the maintainer prefers cleaner syntax over defensive patterns when the behavior is safe.

Applied to files:

  • ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js
📚 Learning: 2025-11-10T23:10:17.470Z
Learnt from: troyciesco
Repo: TryGhost/Ghost PR: 25288
File: ghost/core/core/server/services/member-welcome-emails/jobs/lib/process-entries.js:46-64
Timestamp: 2025-11-10T23:10:17.470Z
Learning: In ghost/core/core/server/services/member-welcome-emails/jobs/lib/process-entries.js and the outbox processing flow, entries are marked as PROCESSING before being processed. If a failure occurs after email send but before deletion, the entry remains stuck in PROCESSING state (not reprocessed). This intentional design prevents duplicate emails. Handling stuck PROCESSING entries is planned for a separate PR.

Applied to files:

  • ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js
🧬 Code graph analysis (1)
ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js (1)
ghost/core/test/unit/server/services/email-analytics/email-analytics-service.test.js (1)
  • EventProcessingResult (6-6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Ghost-CLI tests
  • GitHub Check: Legacy tests (Node 22.13.1, sqlite3)
  • GitHub Check: Acceptance tests (Node 22.13.1, mysql8)
  • GitHub Check: Legacy tests (Node 22.13.1, mysql8)
  • GitHub Check: Acceptance tests (Node 22.13.1, sqlite3)
  • GitHub Check: Unit tests (Node 22.13.1)
  • GitHub Check: Lint
  • GitHub Check: Build & Push
🔇 Additional comments (4)
ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js (4)

393-395: Intermediate cleanup of tracking sets is consistent with aggregation behavior

Deleting processingResult.emailIds/memberIds from allEmailIds/allMemberIds only after a successful aggregateStats call correctly prevents those IDs from being re-aggregated in the final pass, while still leaving them in the sets if aggregation fails (since the delete runs inside the try). This matches the intent of “no double aggregation, but don’t lose work on errors”.


426-439: Final aggregation step correctly targets remaining unaggregated IDs

Building finalEmailIds/finalMemberIds from the union of processingResult and the tracking sets, then passing a plain {emailIds, memberIds} object into aggregateStats, looks correct and keeps aggregation focused on IDs that haven’t already been handled mid-run. The guard on empty arrays avoids unnecessary DB work.

One small thing to watch: this step assumes that every ID ever passed to aggregateStats has been removed from allEmailIds/allMemberIds. If any future changes introduce extra aggregation sites, they’ll also need to maintain these sets to avoid reprocessing.

It’s worth double‑checking there are no other direct aggregateStats(...) callers that should also prune from allEmailIds/allMemberIds to preserve the “aggregate once” guarantee.


470-475: Returned result now reflects cumulative, not windowed, processing

Switching the returned result to cumulativeResult means the fetch methods now report totals across the entire fetch run (including batches that were already aggregated and cleared from processingResult), which is a clear improvement for monitoring/reporting.

This is a behavioral change, though, so any consumers that previously interpreted result as “last window only” need to be validated.

Please ensure tests for EmailAnalyticsService.fetch* and any downstream metrics dashboards are updated to expect cumulative counts rather than just the last in-memory batch.


333-337: No breaking changes detected; cumulative semantics are correct

Verification confirms the cumulativeResult change is sound and aligns with all usage patterns. The ServiceWrapper logs cumulative event counts after job completion, and the internal fetchResult usage at line 292 expects cumulative event counts for decision-making. Tests do not assert per-batch-only behavior, and all callers naturally expect cumulative reporting. The semantic shift from per-batch to cumulative improves correctness without breaking existing code.

@9larsons 9larsons changed the title Fixed email aggregation Fixed email analytics aggregation Nov 24, 2025
@9larsons 9larsons changed the title Fixed email analytics aggregation 🐛 Fixed email analytics aggregation Nov 24, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js (1)

426-439: Final aggregation correctly covers all remaining IDs; minor simplification is possible

The union via new Set([...processingResult.emailIds, ...allEmailIds]) (and the member analog) ensures every un-aggregated emailId/memberId is covered exactly once in the final aggregateStats call, regardless of how many intermediate flushes occurred. As a small optional simplification, you could instead fold the in-flight IDs into the existing sets and then spread once:

-const finalEmailIds = Array.from(new Set([...processingResult.emailIds, ...allEmailIds]));
-const finalMemberIds = Array.from(new Set([...processingResult.memberIds, ...allMemberIds]));
+processingResult.emailIds.forEach(id => allEmailIds.add(id));
+processingResult.memberIds.forEach(id => allMemberIds.add(id));
+const finalEmailIds = Array.from(allEmailIds);
+const finalMemberIds = Array.from(allMemberIds);

This avoids constructing an extra temporary Set and makes the “these are all remaining IDs” intent a bit more explicit.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9009dae and 1961488.

📒 Files selected for processing (1)
  • ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js (4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: troyciesco
Repo: TryGhost/Ghost PR: 25288
File: ghost/core/core/server/services/member-welcome-emails/jobs/lib/process-entries.js:46-64
Timestamp: 2025-11-10T23:10:17.470Z
Learning: In ghost/core/core/server/services/member-welcome-emails/jobs/lib/process-entries.js and the outbox processing flow, entries are marked as PROCESSING before being processed. If a failure occurs after email send but before deletion, the entry remains stuck in PROCESSING state (not reprocessed). This intentional design prevents duplicate emails. Handling stuck PROCESSING entries is planned for a separate PR.
📚 Learning: 2025-10-30T17:13:26.190Z
Learnt from: sam-lord
Repo: TryGhost/Ghost PR: 25303
File: ghost/core/core/server/services/email-service/BatchSendingService.js:19-19
Timestamp: 2025-10-30T17:13:26.190Z
Learning: In ghost/core/core/server/services/email-service/BatchSendingService.js and similar files in the Ghost codebase, prefer using `{...options}` spread syntax without explicit guards like `...(options || {})` when spreading potentially undefined objects, as the maintainer prefers cleaner syntax over defensive patterns when the behavior is safe.

Applied to files:

  • ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js
📚 Learning: 2025-11-10T23:10:17.470Z
Learnt from: troyciesco
Repo: TryGhost/Ghost PR: 25288
File: ghost/core/core/server/services/member-welcome-emails/jobs/lib/process-entries.js:46-64
Timestamp: 2025-11-10T23:10:17.470Z
Learning: In ghost/core/core/server/services/member-welcome-emails/jobs/lib/process-entries.js and the outbox processing flow, entries are marked as PROCESSING before being processed. If a failure occurs after email send but before deletion, the entry remains stuck in PROCESSING state (not reprocessed). This intentional design prevents duplicate emails. Handling stuck PROCESSING entries is planned for a separate PR.

Applied to files:

  • ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Legacy tests (Node 22.13.1, mysql8)
  • GitHub Check: Ghost-CLI tests
  • GitHub Check: Acceptance tests (Node 22.13.1, mysql8)
  • GitHub Check: Acceptance tests (Node 22.13.1, sqlite3)
  • GitHub Check: Legacy tests (Node 22.13.1, sqlite3)
  • GitHub Check: Lint
  • GitHub Check: Unit tests (Node 22.13.1)
  • GitHub Check: E2E Tests (Ember 2/2)
  • GitHub Check: E2E Tests (React 2/2)
  • GitHub Check: E2E Tests (React 1/2)
  • GitHub Check: E2E Tests (Ember 1/2)
  • GitHub Check: Inspect Docker Image
🔇 Additional comments (4)
ghost/core/core/server/services/email-analytics/EmailAnalyticsService.js (4)

333-337: Cumulative result + tracking sets are a solid fix for the previous “last-chunk only” result behavior

Separating cumulativeResult from the in-flight processingResult and introducing allEmailIds/allMemberIds cleanly solves the earlier problem where intermediate aggregateStats calls wiped out counts from the final returned result, while still keeping per-run aggregation bookkeeping explicit.


349-381: Per-batch delta computation looks logically correct and matches the new cumulative semantics

Capturing beforeCounts/before*Ids and then building a batchDelta from the difference ensures cumulativeResult reflects exactly the new work done in each batch, independent of later processingResult resets. This assumes EventProcessingResult initializes these count fields to 0, which aligns with how it’s used elsewhere; if that’s ever not guaranteed, it would be worth normalizing to 0 here, but as-is the logic is sound.


393-395: Mid-stream cleanup of tracking sets is consistent with idempotent aggregation

Removing processingResult.emailIds/memberIds from the tracking sets only after a successful aggregateStats call ensures those IDs aren’t re-aggregated in the final pass, while IDs from failed intermediate aggregations stay in the sets and get retried at the end. This keeps aggregation idempotent and reduces duplicate work.


475-475: Returning cumulativeResult aligns the public result with total work done across the job

Switching the returned result from the in-flight processingResult to cumulativeResult means callers now see stats for all events processed in the run, not just the last window after the last intermediate aggregation. This is a behavioral change but matches the PR description and should make downstream analytics more accurate, assuming tests are updated to expect cumulative counts.

@9larsons 9larsons merged commit 113f6b5 into main Nov 24, 2025
32 checks passed
@9larsons 9larsons deleted the fix-email-aggregation branch November 24, 2025 17:28
9larsons added a commit that referenced this pull request Nov 24, 2025
ref
fcd4034
- Introduced cumulative event counts to track overall metrics across
batches.
- Added unique emailIds and memberIds tracking to prevent
re-aggregation.
- Implemented delta calculations for batch processing to improve final
reporting accuracy.
- Updated final aggregation to ensure all unique emailIds and memberIds
are included in the results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants