feat(run-ops): webapp db topology, flags, and split-mode resolver wiring#4117
feat(run-ops): webapp db topology, flags, and split-mode resolver wiring#4117d-cs wants to merge 9 commits into
Conversation
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (10)
apps/webapp/test/v3/runOpsMigration/runEngineControlPlaneResolver.server.test.ts (1)
121-200: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winMissing test coverage for
resolveAuthenticatedEnv.This suite covers
resolveEnv,resolveWorkerVersion, andassertEnvExists, but notresolveAuthenticatedEnv— the one method in the adapter with custom (uncached) query logic. Adding a test asserting it resolves the authenticated shape (includinggit) and returnsnullfor a missing env would close the gap and would likely have caught the cache-bypass behavior flagged inrunEngineControlPlaneResolver.server.ts.apps/webapp/app/v3/taskRunHeartbeatFailed.server.ts (1)
56-64: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low valueOptional: defer
lockedWorkerresolution to where it's used.
envis needed for the early-return guard on every path, butlockedWorkeris only consumed in the completed-status branch (line 158). Resolving it here adds a resolver call (a DB read under split-OFF) forPENDING/DEQUEUED/EXECUTING/etc. that never use it. Moving theresolveRunLockedWorkercall into that branch avoids the unnecessary lookup.apps/webapp/app/v3/runOpsMigration/runOpsCascadeCleanup.server.ts (2)
118-275: 🩺 Stability & Availability | 🔵 TrivialNo logging/audit trail for a destructive cross-DB cleanup.
RunOpsCascadeCleanupServicedeletes rows across multiple writers with nologgercalls anywhere (compare to other.server.tsfiles in this codebase that log around risky operations). Given deletes are non-transactional and rely on "recovered by re-running" on crash, structured logging of the per-table counts returned bycleanupEnvironment/cleanupProject(or at least on entry/exit) would materially help operators diagnose partial-cleanup states and stray rows after a crash.
1-275: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winNo colocated test coverage for this destructive service.
This file isn't accompanied by a test in this cohort. Given it performs unconditional
deleteManycascades across control-plane and run-ops writers (and is deliberately not gated behindisSplitEnabled()), unit/integration coverage (e.g., verifying ordering doesn't throw, per-table counts, dedup-by-reference in single-DB mode) seems warranted before this ships. Want me to draft a Testcontainers-based test using the existing@internal/testcontainershelpers?apps/webapp/app/v3/runOpsMigration/runOpsMintKind.flipLatency.test.ts (1)
13-25: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueTest reimplements the cache wiring instead of exercising
resolveRunIdMintKinddirectly.
makeCachedFlagduplicatesresolveRunIdMintKind's internalmintCache.get/setlogic rather than importing and calling the real function. If the production caching wiring (key derivation, TTL source, override-forwarding) changes, this suite would keep passing against its own copy without catching the drift. Given the comment explicitly frames this as a documented "current-behavior lock," this is acceptable as-is, but consider testingresolveRunIdMintKinditself (with env vars and$replicamocked) if higher-fidelity coverage is later desired.apps/webapp/app/v3/runOpsMigration/types.ts (1)
11-25: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueUse
typeinstead ofinterfacefor these data shapes.
CrossSeamGuardInputandCrossSeamGuardDecisionare plain data shapes, not behavioral contracts implemented by collaborators, so per project convention they should be type aliases.♻️ Suggested fix
-export interface CrossSeamGuardInput { +export type CrossSeamGuardInput = { waitpointId: string; routeKind: UnblockRouteKind; treeOwnerResidency?: RunOpsResidency; isCrossTreeIdempotency?: boolean; hasLegacyParent?: boolean; -} +}; -export interface CrossSeamGuardDecision { +export type CrossSeamGuardDecision = { store: StoreTarget; /** Always the waitpoint's OWN classification, even when pinned to legacy. */ residency: RunOpsResidency; routeKind: UnblockRouteKind; pinnedReason?: "non-tree-owned" | "cross-tree-idempotency" | "legacy-parent-descendant"; -} +};As per coding guidelines: "Use types over interfaces for TypeScript".
Source: Coding guidelines
apps/webapp/app/v3/runOpsMigration/unblockRouteCatalog.ts (1)
10-17: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueUse
typeinstead ofinterfaceforUnblockRoute.
UnblockRouteis a plain data shape (route metadata record), not a behavioral contract.♻️ Suggested fix
-export interface UnblockRoute { +export type UnblockRoute = { id: string; kind: UnblockRouteKind; /** The relative source path, e.g. "internal-packages/run-engine/src/engine/index.ts". */ site: string; /** Enclosing method/symbol name — NEVER a line number. */ symbol: string; -} +};As per coding guidelines: "Use types over interfaces for TypeScript".
Source: Coding guidelines
apps/webapp/app/v3/runOpsMigration/crossSeamGuard.server.ts (1)
11-17: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value
KNOWN_ROUTE_KINDSduplicates theUnblockRouteKindunion.This set must be manually kept in sync with
UnblockRouteKindintypes.ts; adding a new kind there without updating this set silently makesassertKnownRouteKindreject a valid kind (or vice versa if forgotten here). Consider deriving this from the type via asatisfies-checked array, or centralizing the literal list intypes.tsand building both the union and the set from it.apps/webapp/app/v3/runOpsMigration/readThrough.server.test.ts (1)
41-153: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd coverage for the unclassifiable-id → LEGACY fallback.
No test exercises the
UnclassifiableRunIdcatch branch inreadThrough.server.ts(defaults to"LEGACY"+logger.warn), which is a materially different behavior fromcrossSeamGuard.server.ts's loud rethrow for the same error type. A regression here would go undetected.✅ Suggested additional test
heteroPostgresTest( "ambiguous-length id falls back to LEGACY residency (new-then-legacy probe)", async ({ prisma14, prisma17 }) => { const AMBIGUOUS_RUN_ID = "run_" + "a".repeat(10); // neither cuid nor ksuid length const warn = vi.fn(); const result = await readThroughRun({ runId: AMBIGUOUS_RUN_ID, environmentId: "env_1", readNew: (c) => realRead(c, false), readLegacy: (c) => realRead(c, true), deps: { splitEnabled: true, newClient: prisma17 as unknown as PrismaReplicaClient, legacyReplica: prisma14 as unknown as PrismaReplicaClient, logger: { warn }, }, }); expect(result.source).toBe("legacy-replica"); expect(warn).toHaveBeenCalled(); } );apps/webapp/app/v3/runEngineHandlers.server.ts (1)
211-211: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse a named sentinel constant instead of scattered
""literals.
readRunForEventOrThrow/readRunForEventare called with a bare""placeholder forenvironmentIdin three places (runAttemptFailed, and twice in cachedRunCompleted). As per coding guidelines: "Use named constants for sentinel/placeholder values (for example,const UNSET_VALUE = "__unset__") instead of scattering raw string literals across comparisons."♻️ Suggested fix
// near eventReadDeps definition const NO_ENVIRONMENT_ID = ""; // residency is keyed on runId; environmentId is informational onlyThen replace the three
""call-site literals withNO_ENVIRONMENT_ID.Also applies to: 299-299, 330-330
Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 548c81c5-37b0-432b-b866-589a9d484ed6
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (51)
.server-changes/run-ops-split-realtime-interlock.mdapps/webapp/CLAUDE.mdapps/webapp/app/db.server.tsapps/webapp/app/entry.server.tsxapps/webapp/app/env.server.tsapps/webapp/app/models/runtimeEnvironment.server.tsapps/webapp/app/v3/engineVersion.server.tsapps/webapp/app/v3/eventRepository/index.server.tsapps/webapp/app/v3/featureFlags.tsapps/webapp/app/v3/runEngine.server.tsapps/webapp/app/v3/runEngineHandlers.server.tsapps/webapp/app/v3/runEngineHandlersShared.server.tsapps/webapp/app/v3/runOpsMigration/controlPlaneCache.server.test.tsapps/webapp/app/v3/runOpsMigration/controlPlaneCache.server.tsapps/webapp/app/v3/runOpsMigration/controlPlaneResolver.server.tsapps/webapp/app/v3/runOpsMigration/crossSeamGuard.server.tsapps/webapp/app/v3/runOpsMigration/distinctDbSentinel.server.tsapps/webapp/app/v3/runOpsMigration/mintBatchFriendlyId.server.test.tsapps/webapp/app/v3/runOpsMigration/mintBatchFriendlyId.server.tsapps/webapp/app/v3/runOpsMigration/readThrough.server.test.tsapps/webapp/app/v3/runOpsMigration/readThrough.server.tsapps/webapp/app/v3/runOpsMigration/resolveInheritedMintKind.server.test.tsapps/webapp/app/v3/runOpsMigration/resolveInheritedMintKind.server.tsapps/webapp/app/v3/runOpsMigration/runEngineControlPlaneResolver.server.tsapps/webapp/app/v3/runOpsMigration/runOpsCascadeCleanup.server.tsapps/webapp/app/v3/runOpsMigration/runOpsMintKind.flipLatency.test.tsapps/webapp/app/v3/runOpsMigration/runOpsMintKind.server.test.tsapps/webapp/app/v3/runOpsMigration/runOpsMintKind.server.tsapps/webapp/app/v3/runOpsMigration/runOpsSplitReadGate.tsapps/webapp/app/v3/runOpsMigration/splitMode.server.tsapps/webapp/app/v3/runOpsMigration/types.tsapps/webapp/app/v3/runOpsMigration/unblockRouteCatalog.tsapps/webapp/app/v3/runStore.server.test.tsapps/webapp/app/v3/runStore.server.tsapps/webapp/app/v3/taskRunHeartbeatFailed.server.tsapps/webapp/package.jsonapps/webapp/test/findEnvironmentFromRun.readthrough.test.tsapps/webapp/test/routeLoaders.controlPlane.readthrough.test.tsapps/webapp/test/runDetailLoaders.controlPlane.readthrough.test.tsapps/webapp/test/runEngineHandlers.test.tsapps/webapp/test/runOpsCrossSeamGuard.test.tsapps/webapp/test/runOpsDbTopology.test.tsapps/webapp/test/runOpsMintCutover.test.tsapps/webapp/test/runOpsSplitMode.test.tsapps/webapp/test/runOpsSplitReadGate.test.tsapps/webapp/test/services.controlPlane.readthrough.test.tsapps/webapp/test/v3/runOpsMigration/controlPlaneRepoint.server.test.tsapps/webapp/test/v3/runOpsMigration/controlPlaneResolver.server.test.tsapps/webapp/test/v3/runOpsMigration/distinctDbSentinel.server.test.tsapps/webapp/test/v3/runOpsMigration/runEngineControlPlaneResolver.server.test.tsapps/webapp/vitest.config.ts
88d1290 to
d5610a9
Compare
413a945 to
99643f8
Compare
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
460477f to
1af2bab
Compare
26871d5 to
cdc4eb9
Compare
…ngine wiring Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ape only Migration/drain is deferred, so residency is decided purely by id-shape (ownerEngine): 25-char cuid -> LEGACY, 27-char ksuid -> NEW, unclassifiable -> LEGACY. This is behavior-preserving in production, which never injected a custom isKnownMigrated and, with no migration, always saw the default false. - delete knownMigratedFilter.server.ts + its test - readThrough: drop the isKnownMigrated dep + migrated short-circuit; KEEP the unclassifiable->LEGACY new-then-legacy fallback - resolveInheritedMintKind: collapse to pure ownerEngine id-shape (no deps) - mintBatchFriendlyId: drop isKnownMigrated/isSplitEnabled from ResolveDeps - runEngineHandlersShared: drop isKnownMigrated from EventReadDeps/readRunForEvent (batch-write residency probe via newReplica.batchTaskRun.findFirst is untouched) - tests: delete injected-marker cases, keep pure id-shape assertions Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eration labels Add a pure unit test for ControlPlaneCache covering per-slot round-trips, null-vs-miss distinction, epoch-based invalidation, per-slot key isolation, bounded eviction, and TTL expiry. Add a testcontainer test for probeDistinctDatabases covering distinct clusters, same physical database (with reason), same-cluster-different-database, and fail-closed probe failure. Strip developer-enumeration labels from three existing test files (readThrough step numbers, runEngineHandlers Test-X comments) and rename the run-detail loader read-through test to drop the non-domain "shape 1" name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… deps apps/webapp/package.json declares @internal/run-ops-database (workspace) and @testcontainers/postgresql but the lockfile importer entry was never regenerated, so pnpm install --frozen-lockfile fails for the webapp. Regenerate the importer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Enabling RUN_OPS_SPLIT_ENABLED without REALTIME_BACKEND_NATIVE_ENABLED silently breaks realtime: Electric replicates only from the control-plane DB, so NEW-resident (ksuid) runs on the dedicated run-ops DB are invisible and every realtime subscription hangs. Add a boot-time interlock that refuses split mode in that misconfiguration, mirroring the existing distinct-DB data-loss sentinel. The check is a pure predicate (assertSplitRealtimeInterlock) run synchronously inside assertRunOpsSplitSentinel on the same eager-boot path, failing fast before the async DB probe and before any run-ops routing is wired. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n diagnostics - gate runOpsTopology splitEnabled on RUN_OPS_SPLIT_ENABLED so provisioning both DSNs before flipping the flag cannot open a second pool or route writes ahead of the distinct-DB sentinel - rethrow the original UnclassifiableRunId in the cross-seam guard so its value/valueLength keep reflecting the real waitpoint id - log run-found-but-environment-unresolved distinctly from missing-run - correct the RUN_OPS_DATABASE_URL doc comment (Prisma datasource, not the webapp runtime pool) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cdc4eb9 to
e0b35d5
Compare
What
Wires the run-ops split into the webapp: database topology, environment flags, split-mode gating, and the control-plane resolver/cache layer that the run-store and run-engine seams from the previous PR plug into.
apps/webapp/app/db.server.ts,env.server.ts,entry.server.tsx): adds the run-ops database clients/topology and the environment variables that configure and gate the split.apps/webapp/app/v3/runOpsMigration/): the webapp-side machinery —splitMode.server.ts,controlPlaneResolver.server.ts+controlPlaneCache.server.ts,readThrough.server.ts,crossSeamGuard.server.ts,distinctDbSentinel.server.ts, id-minting helpers (mintBatchFriendlyId,runOpsMintKind,resolveInheritedMintKind),runOpsCascadeCleanup.server.ts, the split read gate, and route/unblock catalogs.app/v3/runStore.server.ts,runEngine.server.ts,runEngineHandlers.server.ts+ newrunEngineHandlersShared.server.ts): points the webapp's store/engine construction at the resolver, and factors shared handler logic out so both seams use one path.runtimeEnvironment.server.ts,eventRepository/index.server.ts,taskRunHeartbeatFailed.server.ts,engineVersion.server.tsroute their run/environment lookups read-through the resolver.413a94511— interlocks split mode against the native realtime backend so the two aren't enabled in an incompatible combination (see.server-changes/run-ops-split-realtime-interlock.md).dc74c57fd— drops the earlier "known-migrated" read layer; residency is determined by id-shape only.Why
PR5 of the run-ops split stack. This is the webapp foundation layer: it stands up the DB topology, flags, and resolver/cache the rest of the stack depends on, and repoints webapp read paths through the resolver. Additive when the split is not enabled (existing single-DB behavior preserved behind flags); behavior-changing on the read-through paths and the realtime interlock.
Tests
New vitest coverage across
apps/webapp/test/and colocated*.server.test.tsfiles: db topology, split mode, split read gate, cross-seam guard, mint cutover / flip latency, control-plane cache, control-plane resolver, distinct-db sentinel, read-through loaders (route loaders, run-detail loaders,findEnvironmentFromRun), and the run-engine handlers. Testcontainers-backed; no mocks.pnpm-lock.yamlsynced for the two new webapp deps.Notes
Draft, stacked on #4116 (
runops/pr04-store-engine). Review that first; this diff is against it.Server-change / changeset note to be added at stack-assembly time.
🤖 Generated with Claude Code