-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
perf(run-engine,webapp): look up PENDING_VERSION runs via ClickHouse #3707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
1794758
perf(run-engine,webapp): look up PENDING_VERSION runs via ClickHouse
ericallam f53d2ca
fix(run-engine): don't emit runStatusChanged when promotion was skipped
ericallam 48c2bc4
fix(run-engine): wire queueRunsWaitingForWorkerBatchSize to PendingVe…
ericallam ff6db49
fix(run-engine): reschedule batch drain on candidateIds count, not pe…
ericallam 98e8272
fix(webapp): log run-engine ClickHouse factory failures at error level
ericallam File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| --- | ||
| area: webapp | ||
| type: improvement | ||
| --- | ||
|
|
||
| PendingVersionSystem now discovers PENDING_VERSION run ids via ClickHouse and re-validates each by primary key in Postgres, reducing read load on the TaskRun status index. Uses a dedicated `RUN_ENGINE_CLICKHOUSE_*` client so it doesn't contend with the main analytics pool. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
24 changes: 24 additions & 0 deletions
24
apps/webapp/app/v3/runEnginePendingVersionLookup.server.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| import { type PendingVersionRunIdLookup } from "@internal/run-engine"; | ||
| import { clickhouseFactory } from "~/services/clickhouse/clickhouseFactoryInstance.server"; | ||
| import { logger } from "~/services/logger.server"; | ||
| import { singleton } from "~/utils/singleton"; | ||
| import { ClickhousePendingVersionLookup } from "./services/clickhousePendingVersionLookup.server"; | ||
|
|
||
| /** | ||
| * Lookup used by `@internal/run-engine`'s `PendingVersionSystem` to find | ||
| * `PENDING_VERSION` TaskRun ids via ClickHouse, removing the need for | ||
| * Postgres index #13 (`TaskRun_status_runtimeEnvironmentId_createdAt_id_idx`). | ||
| * | ||
| * Resolves the ClickHouse client per call via {@link clickhouseFactory} | ||
| * using the `"engine"` client type, configured by `RUN_ENGINE_CLICKHOUSE_*` | ||
| * env vars and routed per-organization for customers with HIPAA / data | ||
| * sovereignty data stores. | ||
| */ | ||
| export const runEnginePendingVersionLookup = singleton( | ||
| "runEnginePendingVersionLookup", | ||
| initializeRunEnginePendingVersionLookup | ||
| ); | ||
|
|
||
| function initializeRunEnginePendingVersionLookup(): PendingVersionRunIdLookup { | ||
| return new ClickhousePendingVersionLookup({ clickhouseFactory, logger }); | ||
| } |
97 changes: 97 additions & 0 deletions
97
apps/webapp/app/v3/services/clickhousePendingVersionLookup.server.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| import { | ||
| type PendingVersionRunIdLookup, | ||
| type PendingVersionRunIdLookupOptions, | ||
| type PendingVersionRunIdLookupResult, | ||
| } from "@internal/run-engine"; | ||
| import { Logger } from "@trigger.dev/core/logger"; | ||
| import type { ClickhouseFactory } from "~/services/clickhouse/clickhouseFactory.server"; | ||
|
|
||
| export type ClickhousePendingVersionLookupOptions = { | ||
| clickhouseFactory: ClickhouseFactory; | ||
| logger: Logger; | ||
| }; | ||
|
|
||
| /** | ||
| * ClickHouse-backed lookup for `PENDING_VERSION` TaskRun ids. | ||
| * | ||
| * Resolves the ClickHouse client per call via the | ||
| * {@link ClickhouseFactory}, which honors per-organization data-store | ||
| * routing (HIPAA / data-sovereignty customers get their own instance, | ||
| * everyone else lands on the shared `engine` client configured by | ||
| * `RUN_ENGINE_CLICKHOUSE_*` env vars). | ||
| * | ||
| * Best-effort by design: replication lag against `task_runs_v2` can | ||
| * produce stale candidates. The run-engine consumer re-validates every | ||
| * id against Postgres by primary key with a `status = 'PENDING_VERSION'` | ||
| * guard before any mutation, so stale ids are dropped at the source of | ||
| * truth. On ClickHouse error we log and return an empty result; the | ||
| * pending-version re-enqueue tail loop retries on the next event. | ||
| */ | ||
| export class ClickhousePendingVersionLookup implements PendingVersionRunIdLookup { | ||
| readonly name = "clickhouse"; | ||
|
|
||
| constructor(private readonly opts: ClickhousePendingVersionLookupOptions) {} | ||
|
|
||
| async lookupPendingVersionRunIds( | ||
| options: PendingVersionRunIdLookupOptions | ||
| ): Promise<PendingVersionRunIdLookupResult> { | ||
| // Empty IN-lists would be a no-op; bail before issuing the query. | ||
| if (options.taskIdentifiers.length === 0 || options.queues.length === 0) { | ||
| return { runIds: [] }; | ||
| } | ||
|
|
||
| let clickhouse; | ||
| try { | ||
| clickhouse = await this.opts.clickhouseFactory.getClickhouseForOrganization( | ||
| options.organizationId, | ||
| "engine" | ||
| ); | ||
| } catch (error) { | ||
| // Factory resolution failures usually mean a real configuration | ||
| // problem (registry misload, missing data store, ClientType mismatch). | ||
| // These are not transient — log at error so ops sees them in dashboards | ||
| // and incident hooks. Query-level errors below stay at warn because | ||
| // those are expected to be transient. | ||
| this.opts.logger.error("ClickhousePendingVersionLookup factory resolution failed", { | ||
| error, | ||
| organizationId: options.organizationId, | ||
| }); | ||
| return { runIds: [] }; | ||
| } | ||
|
|
||
| const builder = clickhouse.taskRuns | ||
| .pendingVersionIdsQueryBuilder() | ||
| // `organization_id` MUST be the leading filter — it is the leading | ||
| // sort-key column on `task_runs_v2` and the only thing that prunes | ||
| // granules cheaply on a multi-tenant table. | ||
| .where("organization_id = {organizationId: String}", { | ||
| organizationId: options.organizationId, | ||
| }) | ||
| .where("project_id = {projectId: String}", { projectId: options.projectId }) | ||
| .where("environment_id = {environmentId: String}", { | ||
| environmentId: options.environmentId, | ||
| }) | ||
| .where("status = 'PENDING_VERSION'") | ||
| .where("task_identifier IN {taskIdentifiers: Array(String)}", { | ||
| taskIdentifiers: options.taskIdentifiers, | ||
| }) | ||
| .where("queue IN {queues: Array(String)}", { queues: options.queues }) | ||
| .where("_is_deleted = 0") | ||
| .orderBy("created_at ASC") | ||
| .limit(options.limit); | ||
|
|
||
| const [queryError, rows] = await builder.execute(); | ||
|
|
||
| if (queryError) { | ||
| this.opts.logger.warn("ClickhousePendingVersionLookup query failed", { | ||
| error: queryError, | ||
| organizationId: options.organizationId, | ||
| projectId: options.projectId, | ||
| environmentId: options.environmentId, | ||
| }); | ||
| return { runIds: [] }; | ||
| } | ||
|
|
||
| return { runIds: rows.map((row) => row.run_id) }; | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
48 changes: 48 additions & 0 deletions
48
internal-packages/run-engine/src/engine/services/pendingVersionLookup.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| /** | ||
| * Lookup interface for discovering TaskRun ids that are currently in the | ||
| * `PENDING_VERSION` status for a given background-worker filter. | ||
| * | ||
| * The default Postgres-backed implementation lives in the webapp and is | ||
| * injected via {@link SystemResources}. The run-engine package only owns | ||
| * the contract; concrete implementations are provided by the consumer. | ||
| * | ||
| * Best-effort by design: implementations may return stale ids (rows that | ||
| * have since transitioned) or omit recently-inserted rows (replication | ||
| * lag against an analytical store). The caller MUST re-validate every | ||
| * returned id against the source-of-truth database before mutating it. | ||
| */ | ||
| export type PendingVersionRunIdLookupOptions = { | ||
| organizationId: string; | ||
| projectId: string; | ||
| environmentId: string; | ||
| taskIdentifiers: string[]; | ||
| queues: string[]; | ||
| /** Maximum number of ids to return. Implementations must respect this cap. */ | ||
| limit: number; | ||
| }; | ||
|
|
||
| export type PendingVersionRunIdLookupResult = { | ||
| runIds: string[]; | ||
| }; | ||
|
|
||
| export interface PendingVersionRunIdLookup { | ||
| /** Stable identifier for logs and metrics, e.g. "clickhouse", "test-noop". */ | ||
| readonly name: string; | ||
|
|
||
| lookupPendingVersionRunIds( | ||
| options: PendingVersionRunIdLookupOptions | ||
| ): Promise<PendingVersionRunIdLookupResult>; | ||
| } | ||
|
|
||
| /** | ||
| * Default lookup used when nothing is wired up (tests that don't exercise | ||
| * the lookup, or engine instances that don't run the pending-version | ||
| * resolver). Returns an empty result so the system no-ops cleanly. | ||
| */ | ||
| export class NoopPendingVersionRunIdLookup implements PendingVersionRunIdLookup { | ||
| readonly name = "noop"; | ||
|
|
||
| async lookupPendingVersionRunIds(): Promise<PendingVersionRunIdLookupResult> { | ||
| return { runIds: [] }; | ||
| } | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.