perf(webapp): cache task metadata in Redis for the trigger hotpath#3625
Conversation
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis pull request implements a Redis-backed task metadata cache to accelerate task trigger operations by avoiding per-request database lookups for task defaults. The cache stores task metadata (TTL, trigger source/task kind, queue identifiers) in two Redis hash keyspaces: one scoped by environment (for non-locked triggers) and one scoped by worker ID (for locked-to-version triggers). Environment configuration includes Redis connection settings and separate TTL values for each keyspace. The queue resolution layer (DefaultQueueManager) now reads cached metadata with Prisma fallback and automatic backfill on cache miss. Worker creation, deployment creation, and deployment promotion flows capture task metadata and synchronously populate the cache. Comprehensive integration tests verify warm cache behavior, cache-miss fallback and backfill, queue override + cached TTL interaction, by-worker keyspace routing for locked triggers, and deployment promotion cache synchronization. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/webapp/app/services/taskMetadataCache.server.ts (1)
50-64: ⚡ Quick winAdd zod validation to the
decode()function for schema-drift protection.
JSON.parse(raw) as EncodedEntrysilently accepts any payload. If cached data drifts (field removed,TaskTriggerSourcevalue retired, or malformed entry),decode()will produce aTaskMetadataEntrywith undefined fields, surfacing bugs far downstream in the trigger hotpath. A zod schema hardens this and aligns with the repo's validation conventions.♻️ Proposed refactor
-import type { Redis, Result, Callback } from "ioredis"; -import type { TaskTriggerSource } from "@trigger.dev/database"; -import { logger } from "./logger.server"; +import type { Redis, Result, Callback } from "ioredis"; +import { TaskTriggerSource } from "@trigger.dev/database"; +import { z } from "zod"; +import { logger } from "./logger.server"; @@ -type EncodedEntry = { - t: string | null; - k: TaskTriggerSource; - q: string | null; - n: string; -}; +const EncodedEntrySchema = z.object({ + t: z.string().nullable(), + k: z.nativeEnum(TaskTriggerSource), + q: z.string().nullable(), + n: z.string(), +}); +type EncodedEntry = z.infer<typeof EncodedEntrySchema>; @@ function decode(slug: string, raw: string): TaskMetadataEntry | null { try { - const parsed = JSON.parse(raw) as EncodedEntry; + const parsed = EncodedEntrySchema.parse(JSON.parse(raw)); return { slug, ttl: parsed.t, triggerSource: parsed.k, queueId: parsed.q, queueName: parsed.n, }; } catch (error) { logger.error("Failed to decode task metadata cache entry", { slug, error }); return null; } }Per coding guideline: "Use zod for validation in packages/core and apps/webapp".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/webapp/app/services/taskMetadataCache.server.ts` around lines 50 - 64, The decode function currently trusts JSON.parse and casts to EncodedEntry which can produce undefined fields; add a zod schema (e.g., EncodedEntrySchema) that validates the shape { t: number, k: TaskTriggerSource, q?: string | null, n?: string | null } (use the existing TaskTriggerSource enum/union for k), import zod and run safeParse on the parsed JSON inside decode(slug, raw); if validation fails, logger.error with the parse errors and return null, otherwise map the validated fields to TaskMetadataEntry (ttl <- t, triggerSource <- k, queueId <- q, queueName <- n). Ensure you reference and validate the exact field names t/k/q/n and use EncodedEntrySchema.safeParse instead of a raw type cast.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/webapp/app/services/taskMetadataCache.server.ts`:
- Around line 50-64: The decode function currently trusts JSON.parse and casts
to EncodedEntry which can produce undefined fields; add a zod schema (e.g.,
EncodedEntrySchema) that validates the shape { t: number, k: TaskTriggerSource,
q?: string | null, n?: string | null } (use the existing TaskTriggerSource
enum/union for k), import zod and run safeParse on the parsed JSON inside
decode(slug, raw); if validation fails, logger.error with the parse errors and
return null, otherwise map the validated fields to TaskMetadataEntry (ttl <- t,
triggerSource <- k, queueId <- q, queueName <- n). Ensure you reference and
validate the exact field names t/k/q/n and use EncodedEntrySchema.safeParse
instead of a raw type cast.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2d23dad3-c348-4cf6-8399-d667fe3f4f4b
📒 Files selected for processing (5)
apps/webapp/app/services/taskMetadataCache.server.tsapps/webapp/app/v3/services/createBackgroundWorker.server.tsapps/webapp/app/v3/services/createDeploymentBackgroundWorkerV3.server.tsapps/webapp/app/v3/services/createDeploymentBackgroundWorkerV4.server.tsapps/webapp/test/engine/triggerTask.test.ts
🚧 Files skipped from review as they are similar to previous changes (4)
- apps/webapp/app/v3/services/createDeploymentBackgroundWorkerV4.server.ts
- apps/webapp/app/v3/services/createBackgroundWorker.server.ts
- apps/webapp/test/engine/triggerTask.test.ts
- apps/webapp/app/v3/services/createDeploymentBackgroundWorkerV3.server.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Import from@trigger.dev/coresubpaths only, never from the root. Subpath imports must be used to maintain proper module boundaries.
When writing Trigger.dev tasks, always import from@trigger.dev/sdk. Never use@trigger.dev/sdk/v3or deprecatedclient.defineJob.
Prisma is version 6.14.0. Use the Prisma client frominternal-packages/databasefor all database operations.
For ClickHouse client, schema migrations, and analytics queries, useinternal-packages/clickhouse.
Files:
apps/webapp/app/services/taskMetadataCache.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
apps/webapp/app/services/taskMetadataCache.server.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Add crumbs as you write code — not just when debugging. Mark lines with
//@Crumbsor wrap blocks in `// `#region` `@crumbs. They stay on the branch throughout development and are stripped byagentcrumbs stripbefore merge.
Files:
apps/webapp/app/services/taskMetadataCache.server.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/webapp/app/services/taskMetadataCache.server.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
apps/webapp/**/*.{ts,tsx}: Access environment variables through theenvexport ofenv.server.tsinstead of directly accessingprocess.env
Use subpath exports from@trigger.dev/corepackage instead of importing from the root@trigger.dev/corepathUse named constants for sentinel/placeholder values (e.g.
const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons
Files:
apps/webapp/app/services/taskMetadataCache.server.ts
apps/webapp/**/*.server.ts
📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)
apps/webapp/**/*.server.ts: Never userequest.signalfor detecting client disconnects. UsegetRequestAbortSignal()fromapp/services/httpAsyncStorage.server.tsinstead, which is wired directly to Expressres.on('close')and fires reliably
Access environment variables viaenvexport fromapp/env.server.ts. Never useprocess.envdirectly
Always usefindFirstinstead offindUniquein Prisma queries.findUniquehas an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance).findFirstis never batched and avoids this entire class of issues
Files:
apps/webapp/app/services/taskMetadataCache.server.ts
**/*.{ts,tsx,js,jsx,json,md,css,scss}
📄 CodeRabbit inference engine (AGENTS.md)
Code formatting is enforced using Prettier. Run
pnpm run formatbefore committing
Files:
apps/webapp/app/services/taskMetadataCache.server.ts
apps/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
When modifying only server components (
apps/webapp/,apps/supervisor/, etc.) with no package changes, add a.server-changes/file instead of a changeset. See.server-changes/README.mdfor format and documentation.
Files:
apps/webapp/app/services/taskMetadataCache.server.ts
🧠 Learnings (5)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
apps/webapp/app/services/taskMetadataCache.server.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
apps/webapp/app/services/taskMetadataCache.server.ts
📚 Learning: 2026-03-26T09:02:07.973Z
Learnt from: myftija
Repo: triggerdotdev/trigger.dev PR: 3274
File: apps/webapp/app/services/runsReplicationService.server.ts:922-924
Timestamp: 2026-03-26T09:02:07.973Z
Learning: When parsing Trigger.dev task run annotations in server-side services, keep `TaskRun.annotations` strictly conforming to the `RunAnnotations` schema from `trigger.dev/core/v3`. If the code already uses `RunAnnotations.safeParse` (e.g., in a `#parseAnnotations` helper), treat that as intentional/necessary for atomic, schema-accurate annotation handling. Do not recommend relaxing the annotation payload schema or using a permissive “passthrough” parse path, since the annotations are expected to be written atomically in one operation and should not contain partial/legacy payloads that would require a looser parser.
Applied to files:
apps/webapp/app/services/taskMetadataCache.server.ts
📚 Learning: 2026-05-05T09:38:02.512Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.
Applied to files:
apps/webapp/app/services/taskMetadataCache.server.ts
📚 Learning: 2026-05-12T21:04:05.815Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.
Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.
Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.
Applied to files:
apps/webapp/app/services/taskMetadataCache.server.ts
…builds (#3626) ## Summary `LocalsKey<T>` (the type returned by `locals.create()`) was branded with a module-level `declare const __local: unique symbol`. Each such declaration is its own nominal type, and `tshy` emits separate `.d.ts` files for the ESM and CJS outputs — each gets its own `__local` symbol. Under certain pnpm hoisting layouts a single TypeScript compilation can resolve `LocalsKey` from both the ESM source path and the CJS dist path within the same call site, producing two structurally-incompatible variants of the same type. TS surfaces this as the misleading error: ``` Argument of type 'LocalsKey<X>' is not assignable to parameter of type 'LocalsKey<X>'. Property '[__local]' is missing in type 'LocalsKey<X>' but required in type 'BrandLocal<X>'. ``` The error has been hitting CI on PRs opened since the chat.agent stack landed (e.g. #3625 typecheck job), but doesn't reproduce on developer machines where the pnpm node_modules layout was built up incrementally. ## Fix Replace the `unique symbol` brand with an optional phantom field that carries `T` at the type level: ```ts // before declare const __local: unique symbol; type BrandLocal<T> = { [__local]: T }; export type LocalsKey<T> = BrandLocal<T> & { readonly id: string; readonly __type: unique symbol; }; // after export type LocalsKey<T> = { readonly id: string; readonly __type: symbol; /** Phantom carrier for the value type — never read at runtime. */ readonly __valueType?: T; }; ``` The ESM and CJS `.d.ts` outputs now produce structurally identical types, so cross-output resolution no longer produces a mismatch. `T` is still carried at the type level via the optional phantom field. The runtime shape is unchanged — `manager.ts` was already casting via `as unknown`, which is no longer needed. ## Test plan - [ ] `pnpm run typecheck --filter @trigger.dev/core --filter @trigger.dev/sdk` - [ ] `pnpm run build --filter @trigger.dev/core --filter @trigger.dev/sdk` (clean rebuild) — confirms the ESM and CJS dist `.d.ts` outputs no longer carry distinct `unique symbol` declarations - [ ] `pnpm --filter @trigger.dev/core test test/mockTaskContext.test.ts --run` - [ ] `pnpm --filter @trigger.dev/sdk test test/mockChatAgent.test.ts --run`
The trigger-task hotpath used to early-return without a DB query when a
caller passed both a queue override and a per-trigger TTL — the hottest
configuration on the trigger API. Adding triggerSource to the resolver
so the runs-list Source filter could distinguish STANDARD / SCHEDULED /
AGENT runs removed those early-returns, costing +2 DB queries per
trigger on non-locked calls and +1 on locked calls.
The resolver now reads BackgroundWorkerTask metadata (ttl, triggerSource,
queueId, queueName) from a Redis HASH keyed by env or by worker, with
PG fallback on miss that back-fills the cache.
Two key spaces:
- task-meta:env:{envId} refreshed at every deploy promotion;
24h safety TTL.
- task-meta:by-worker:{workerId} used for lockToVersion triggers;
30d sliding TTL.
Cache writes use Lua scripts via defineCommand so DEL + HSET + EXPIRE
land atomically (concurrent readers never see the empty intermediate
state of a naive pipeline). Read-path back-fill uses single-field
upserts so concurrent back-fills don't wipe each other's siblings.
Configurable via a new TASK_META_CACHE_REDIS_* env-var prefix that
falls back to the default REDIS_* set, so operators can route the
cache to a dedicated Redis instance if they want.
- TaskMetadataCache: interface → type alias, matching the repo's 'prefer type unless extending' standard. - syncTaskMetadataCache calls in createBackgroundWorker, createDeploymentBackgroundWorkerV3, and createDeploymentBackgroundWorkerV4 are now wrapped in tryCatch and log on failure. Matches the pattern in changeCurrentDeployment so a Redis blip can't strand a successful worker build or deploy promotion. - Back-fill assertion in 'cache miss falls through to PG' now polls with a bounded timeout instead of a fixed sleep, removing the CI flake surface. - Promotion assertion in 'ChangeCurrentDeploymentService promotes the env cache' now pre-clears the by-worker + env keys and asserts they're null before the call, so the test proves the service did the write.
1b08acb to
1a06ef6
Compare
The cache now exposes populateByCurrentWorker(envId, workerId, entries) and setByCurrentWorker(envId, workerId, entry) which write both task-meta:env and task-meta:by-worker keyspaces in a single Lua transaction. Drops the syncTaskMetadataCache helper and its isCurrent boolean — each caller picks the right method based on context (DEV worker create + deploy promotion go through populateByCurrentWorker; V4 deploy build goes through populateByWorker). Two new Lua scripts via defineCommand: - taskMetaReplaceTwoHashes: DEL + HSET + EXPIRE both keys atomically - taskMetaSetTwoFields: HSET both keys + env-preserve-TTL + worker-refresh-TTL atomically The read-path back-fill for non-locked triggers now also writes both keyspaces atomically via setByCurrentWorker, so concurrent back-fills can't leave one keyspace populated and the other empty. Cache methods log+swallow Redis errors internally, so the outer tryCatch wrappers at the four call sites are gone — a Redis blip can't break any post-cache side effect.
…worker, clear stale cache on zero-task deploys Two CodeRabbit findings: 1. The locked + queue-override branch silently accepted a missing task slug on the locked worker version, leaving `taskKind` / `ttl` as undefined instead of erroring. Now matches the no-override branch and throws a ServiceValidationError if the task isn't on the locked worker. 2. The cache populate methods short-circuited on empty entries, leaving stale hashes in Redis when a worker registered or got promoted with zero tasks. The Lua scripts already handle the empty-entries case (DEL + no HSET), so just drop the gates at the cache class and at the four call sites so promotion correctly clears prior data.
Two CI failures on the prior push: 1. Typecheck TS2556 in taskMetadataCache.server.ts: the populate methods were spreading a mixed string[] into the Lua `defineCommand` rest parameter. Pass the fixed positional args (key, ttl) directly and spread only the variable field/value pairs. 2. Webapp shard 6: the cache test file pulled in ChangeCurrentDeploymentService, whose transitive import chain reaches triggerTaskV1.server.ts → autoIncrementCounter.server.ts which throws at module load when REDIS_HOST/REDIS_PORT aren't in the shard's env. Drop test 5 (deploy-promotion side effect) — it doesn't belong in triggerTask.test.ts anyway.
…writes Stamp the env hash with an `__owner_worker_id` field at promotion time; the back-fill Lua script reads it and CAS-skips the env write if it doesn't match the worker the back-filler resolved to. Scenario the guard closes: 1. Resolver reads env cache, misses 2. Resolver reads PG: findCurrentWorker -> A, BWT for A -> A's data 3. [promotion to B fires, atomically replaces env hash + by-worker:B] 4. Resolver fires back-fill with workerId=A -> env hash now stamped owner=B; A != B -> back-fill skips env write -> by-worker:A still written (harmless; key contains workerId, dead worker) The by-worker keyspace doesn't need a CAS check — the key itself contains the workerId, so stale writes land in a dead worker's keyspace and are never read. Devin Review flagged this as a sub-millisecond race bounded by the 24h env TTL. Cost to close it is one extra HGET inside the Lua call (atomic with the HSET, no extra round trip) and one extra HSET on the promotion write.
Match main's pre-PR behavior — when a caller passes `lockToVersion` and a queue override and the task slug isn't registered on that worker version, fall through with `taskKind = undefined` (coalesced to "STANDARD" downstream) and `taskTtl = undefined` instead of throwing. This reverts the throw added earlier in this PR. The trade-off review landed on "strict consistency check is better" twice (CodeRabbit recommended it, Devin pushed back, we kept the throw) — but the net is a behavioral change in the trigger API, and customers running `lockToVersion` + a queue override against a slug that doesn't exist on that version would start getting 4xx errors where they got silent defaults before. Default-to-silent matches main and avoids the surprise. The no-override branch keeps its throw because there's no queue to route to in that case — the failure mode there is unrecoverable.
Summary
The trigger-task hotpath used to early-return without a DB query when a
caller passed both a queue override and a per-trigger TTL — the hottest
configuration on the trigger API. Adding
triggerSourceto the resolverso the runs-list "Source" filter could distinguish STANDARD / SCHEDULED /
AGENT runs removed those early-returns, costing +2 DB queries per trigger
on non-locked calls and +1 on locked calls.
This change caches
BackgroundWorkerTaskmetadata (ttl,triggerSource,queueId,queueName) in Redis so the resolver can satisfy every callerconfiguration with a single
HGETon the warm path. PG fallback on missback-fills the cache.
Follow-up to #3542.
Design
Two key spaces:
task-meta:env:{envId}— the "current worker" view, refreshed at everydeploy promotion. 24h safety TTL.
task-meta:by-worker:{workerId}— used forlockToVersiontriggers.Immutable post-create. 30d sliding TTL so historical workers age out.
Cache writes use Lua scripts via
defineCommandsoDEL+HSET+EXPIREland atomically — concurrent readers never see the emptyintermediate state of a naive pipeline. Read-path back-fill uses
single-field upserts so concurrent back-fills don't wipe each other's
siblings.
The cache lives behind its own
TASK_META_CACHE_REDIS_*env-var prefixthat falls back to the default
REDIS_*set, so operators can route thecache to a dedicated Redis instance if they want.
The service/instance file split (
taskMetadataCache.server.tsfor thepure class,
taskMetadataCacheInstance.server.tsfor the env-wiredsingleton) mirrors the existing
runsReplicationService/runsReplicationInstancepattern.Test plan
pnpm run typecheck --filter webapppnpm run test ./test/engine/triggerTask.test.ts --run— 8existing tests untouched + 5 new tests covering warm cache, cold
miss with back-fill, queue + ttl path, by-worker vs env keyspace,
and the promotion cache write
with the expected TTLs, and
redis-cli HGETALL "tr:task-meta:env:<envId>"returns the cached entries
Benchmark
Measured
DefaultQueueManager.resolveQueuePropertiesagainst a real Postgres + Redis (vitestcontainerTest, single-host docker). 500 sequential calls and 2,000 parallel calls (concurrency=50) per scenario, request shaped as{ taskId, queue: "bench-queue", ttl: "5m" }— the hot path this PR restores.Read:
HGETmiss adds <50 µs against the two Postgres queries that follow, so the worst case is not worse than today.Caveats: single-host docker, local Postgres + Redis, resolver-only measurement (excludes the rest of the trigger transaction). Prod adds region-local Redis RTT (~0.3–0.8 ms) which shifts warm absolute numbers up but keeps the ratio intact.