Skip to content

fix(webapp): retain sessions-replication singleton import via globalThis assignment#3738

Merged
matt-aitken merged 1 commit into
mainfrom
feature/tri-9864-fixwebapp-restore-sessions-replication-singleton-void-x-tree
May 24, 2026
Merged

fix(webapp): retain sessions-replication singleton import via globalThis assignment#3738
matt-aitken merged 1 commit into
mainfrom
feature/tri-9864-fixwebapp-restore-sessions-replication-singleton-void-x-tree

Conversation

@matt-aitken
Copy link
Copy Markdown
Member

Linear: TRI-9864 (Urgent)
Production incident: TRI-9863 (mitigated by image revert in cloud#910)

Bug

apps/webapp/package.json declares "sideEffects": false. PR #3333 (71d98b4e) replaced the previous real method-call retention idiom at the two sessionsReplicationInstance import sites with:

import { sessionsReplicationInstance } from "...";
void sessionsReplicationInstance;

esbuild treats void <identifier>; as a pure expression statement under sideEffects: false and tree-shakes the entire import — including the singleton(...) call inside sessionsReplicationInstance.server.ts which is the only thing that fires initializeSessionsReplicationInstance(). The sessions→ClickHouse logical replication worker never starts, the slot is unconsumed, lag grows.

How it manifested in production

cloud#907's image bump rolled the SessionReplicationService ECS task on prod at 14:32 UTC. The new container's startup log emitted 🗃️ Runs replication service enabled but not 🗃️ Sessions replication service enabled or 🗃️ Sessions replication service started. CloudWatch OldestReplicationSlotLag grew at ~220 MB/min and the High replication lag alarm fired at 14:37 UTC. Prod was reverted to the previous image (cloud#910) to stop the bleed.

Verification

grep of the built bundle apps/webapp/build/index.js (built from c0365d36):

  • 3 occurrences of Runs replication / runsReplicationInstance strings ✅
  • 0 occurrences of Sessions replication / sessionsReplicationInstance / SessionsReplicationService

The runs path survives tree-shaking because adminWorker.server.ts and admin.api.v1.runs-replication.* routes have real method calls (.start(), .teardown(), .backfill()) — observable uses the tree-shaker must preserve. The sessions singleton has no real callers, only the void no-ops, hence its complete elimination from the bundle.

Fix

Replace void sessionsReplicationInstance; with an assignment to globalThis, an unambiguous observable side effect the bundler cannot eliminate:

(globalThis as Record<string, unknown>).__sessionsReplicationInstance =
  sessionsReplicationInstance;

Applied at both call sites: apps/webapp/app/entry.server.tsx and apps/webapp/app/v3/services/adminWorker.server.ts.

Surrounding comments updated to document the bundler interaction so the next maintainer doesn't reintroduce void.

Out of scope (follow-ups)

  • Robustness improvement: change apps/webapp/package.json from "sideEffects": false to an allowlist that includes *Instance.server.ts files. Prevents the same regression shape via any future *Instance singleton.
  • Build-time check: add a grep post-build step in publish.yml requiring "Sessions replication" to appear in apps/webapp/build/index.js. Catches this exact regression at CI time.

Test plan

  • pnpm run typecheck --filter webapp clean
  • After merge + publish: confirm new image's SessionReplicationService container logs 🗃️ Sessions replication service enabled and 🗃️ Sessions replication service started at startup
  • After re-deploying to prod: confirm OldestReplicationSlotLag stops growing and drains

…his assignment

`apps/webapp/package.json` declares `"sideEffects": false`. PR #3333 (commit
71d98b4) replaced the previous real method-call retention idiom at the two
singleton import sites with:

    void sessionsReplicationInstance;

esbuild treats `void <identifier>;` as a pure expression statement under
`sideEffects: false` and tree-shakes the entire import — including the
`singleton(...)` call inside `sessionsReplicationInstance.server.ts` which is
the only thing that fires `initializeSessionsReplicationInstance()`. The
sessions→ClickHouse logical replication worker never starts, the slot is
unconsumed, lag grows.

Verified empirically: grep of the built bundle `apps/webapp/build/index.js`
(from c0365d3) returns 0 occurrences of `Sessions replication` /
`sessionsReplicationInstance` / `SessionsReplicationService`, vs 3 occurrences
of the runs-replication strings. The runs path survives because
`adminWorker.server.ts` and `admin.api.v1.runs-replication.*` routes have
real method calls (.start(), .teardown(), .backfill()) that the tree-shaker
must preserve.

Fix: replace the `void` with an assignment to `globalThis`, which is an
unambiguous observable side effect the bundler cannot eliminate. Update the
surrounding comments at both sites to document the bundler interaction so the
next maintainer doesn't reintroduce `void`.

Incident: TRI-9864. Mitigated in production by reverting the image (cloud#910 /
TRI-9863); this PR is the forward fix that lets the original image bump land
safely.

Follow-ups worth filing separately:
- Change apps/webapp/package.json's "sideEffects": false to an allowlist that
  includes *Instance.server.ts files, so the same shape of regression in any
  future *Instance singleton is structurally prevented.
- Add a build-time grep check (e.g. require 'Sessions replication' to appear in
  apps/webapp/build/index.js) to publish.yml so this exact regression fails CI.

Refs TRI-9864, TRI-9863.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 24, 2026

⚠️ No Changeset found

Latest commit: ebd1176

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e8349e65-59b1-4146-ae99-e330a0df516b

📥 Commits

Reviewing files that changed from the base of the PR and between eefb96c and ebd1176.

📒 Files selected for processing (2)
  • apps/webapp/app/entry.server.tsx
  • apps/webapp/app/v3/services/adminWorker.server.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/app/entry.server.tsx
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/app/entry.server.tsx
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

**/*.{ts,tsx,js,jsx}: Prefer static imports over dynamic imports. Only use dynamic import() when circular dependencies cannot be resolved otherwise, code splitting is needed for performance, or the module must be loaded conditionally at runtime.
Import from @trigger.dev/core using subpaths only - never import from the root.
When writing Trigger.dev tasks, always import from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.
Add agentcrumbs markers (// @Crumbs or `#region `@crumbs) as you write code, not just when debugging. They stay on the branch throughout development and are stripped by agentcrumbs strip before merge.

Files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/app/entry.server.tsx
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: Access environment variables through the env export of env.server.ts instead of directly accessing process.env
Use subpath exports from @trigger.dev/core package instead of importing from the root @trigger.dev/core path

Use named constants for sentinel/placeholder values (e.g. const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons

Files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/app/entry.server.tsx
apps/webapp/**/*.server.ts

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

apps/webapp/**/*.server.ts: Never use request.signal for detecting client disconnects. Use getRequestAbortSignal() from app/services/httpAsyncStorage.server.ts instead, which is wired directly to Express res.on('close') and fires reliably
Access environment variables via env export from app/env.server.ts. Never use process.env directly
Always use findFirst instead of findUnique in Prisma queries. findUnique has an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance). findFirst is never batched and avoids this entire class of issues

Files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}

📄 CodeRabbit inference engine (AGENTS.md)

Code formatting must be enforced using Prettier before committing

Files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/app/entry.server.tsx
apps/webapp/**/*.{tsx,jsx}

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

Only use useCallback/useMemo for context provider values, expensive derived data that is a dependency elsewhere, or stable refs required by a dependency array. Don't wrap ordinary event handlers or trivial computations

Files:

  • apps/webapp/app/entry.server.tsx
🧠 Learnings (11)
📚 Learning: 2026-03-10T17:56:20.938Z
Learnt from: samejr
Repo: triggerdotdev/trigger.dev PR: 3201
File: apps/webapp/app/v3/services/setSeatsAddOn.server.ts:25-29
Timestamp: 2026-03-10T17:56:20.938Z
Learning: Do not implement local userId-to-organizationId authorization checks inside org-scoped service classes (e.g., SetSeatsAddOnService, SetBranchesAddOnService) in the web app. Rely on route-layer authentication (requireUserId(request)) and org membership enforcement via the _app.orgs.$organizationSlug layout route. Any userId/organizationId that reaches these services from org-scoped routes has already been validated. Apply this pattern across all org-scoped services to avoid redundant auth checks and maintain consistency.

Applied to files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/app/entry.server.tsx
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/app/entry.server.tsx
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.

Applied to files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/app/entry.server.tsx
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.

Applied to files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/app/entry.server.tsx
📚 Learning: 2026-03-29T19:16:28.864Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3291
File: apps/webapp/app/v3/featureFlags.ts:53-65
Timestamp: 2026-03-29T19:16:28.864Z
Learning: When reviewing TypeScript code that uses Zod v3, treat `z.coerce.*()` schemas as their direct Zod type (e.g., `z.coerce.boolean()` returns a `ZodBoolean` with `_def.typeName === "ZodBoolean"`) rather than a `ZodEffects`. Only `.preprocess()`, `.refine()`/`.superRefine()`, and `.transform()` are expected to wrap schemas in `ZodEffects`. Therefore, in reviewers’ logic like `getFlagControlType`, do not flag/unblock failures that require unwrapping `ZodEffects` when the input schema is a `z.coerce.*` schema.

Applied to files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
📚 Learning: 2026-05-05T09:38:02.512Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.

Applied to files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
📚 Learning: 2026-05-12T21:04:05.815Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.

Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.

Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.

Applied to files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
  • apps/webapp/app/entry.server.tsx
📚 Learning: 2026-05-14T08:21:07.614Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3614
File: apps/webapp/app/v3/mollifier/mollifierGate.server.ts:48-52
Timestamp: 2026-05-14T08:21:07.614Z
Learning: When using Trigger.dev v3 feature flags in the webapp, prefer the existing per-org gating mechanism supported by `flag()` via the `overrides` argument. Pass `Organization.featureFlags` (from `environment.organization.featureFlags`) as the `overrides` value; overrides must take precedence over the global `featureFlag` row. Do not require schema changes or add an `orgId` field to `FlagsOptions` for per-org gating—use the overrides pattern consistently (e.g., in gate flows like `resolveOrgFlag` and any server code that threads `environment.organization.featureFlags` into the gate call).

Applied to files:

  • apps/webapp/app/v3/services/adminWorker.server.ts
📚 Learning: 2026-02-11T16:37:32.429Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3019
File: apps/webapp/app/components/primitives/charts/Card.tsx:26-30
Timestamp: 2026-02-11T16:37:32.429Z
Learning: In projects using react-grid-layout, avoid relying on drag-handle class to imply draggability. Ensure drag-handle elements only affect dragging when the parent grid item is configured draggable in the layout; conditionally apply cursor styles based on the draggable prop. This improves correctness and accessibility.

Applied to files:

  • apps/webapp/app/entry.server.tsx
📚 Learning: 2026-05-08T21:00:20.973Z
Learnt from: samejr
Repo: triggerdotdev/trigger.dev PR: 3538
File: apps/webapp/app/components/primitives/Resizable.tsx:60-78
Timestamp: 2026-05-08T21:00:20.973Z
Learning: In the triggerdotdev/trigger.dev codebase, treat Zod as a boundary validation tool (API handlers, request/response validation, and storage/DB read/write validation), not as inline render-time validation inside React components/primitive UI code. For render-time guards, prefer small manual type-narrowing checks (e.g., a short predicate like ~10–20 lines) over importing Zod into UI primitives, to avoid per-render schema-parse overhead and unnecessary abstraction. Use the manual guard approach unless you truly need schema validation at a boundary; only then introduce Zod.

Applied to files:

  • apps/webapp/app/entry.server.tsx
🔇 Additional comments (2)
apps/webapp/app/entry.server.tsx (1)

33-43: LGTM!

apps/webapp/app/v3/services/adminWorker.server.ts (1)

9-18: LGTM!


Walkthrough

This pull request fixes bundler tree-shaking of the sessions replication singleton initializer across two server-side module entry points. The prior pattern—void sessionsReplicationInstance;—relies on the import's side effects but can be eliminated by bundlers when sideEffects is false. Both apps/webapp/app/entry.server.tsx and apps/webapp/app/v3/services/adminWorker.server.ts are updated to assign the singleton to globalThis.__sessionsReplicationInstance, making the side effect observable and preventing optimization-based removal during bundling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main fix: restoring the sessions-replication singleton via a globalThis assignment to prevent tree-shaking.
Description check ✅ Passed The description is comprehensive and detailed, covering the bug context, root cause, production impact, fix implementation, and a clear test plan.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/tri-9864-fixwebapp-restore-sessions-replication-singleton-void-x-tree

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

Open in Devin Review

@matt-aitken matt-aitken merged commit 596a9bb into main May 24, 2026
30 checks passed
@matt-aitken matt-aitken deleted the feature/tri-9864-fixwebapp-restore-sessions-replication-singleton-void-x-tree branch May 24, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants