Skip to content

fix: FetchUrl permanent codes + SQLite v4 + error-code registry#518

Merged
sroussey merged 6 commits into
mainfrom
claude/libs-main-critical-high-fixes-Bi1rh
May 20, 2026
Merged

fix: FetchUrl permanent codes + SQLite v4 + error-code registry#518
sroussey merged 6 commits into
mainfrom
claude/libs-main-critical-high-fixes-Bi1rh

Conversation

@sroussey
Copy link
Copy Markdown
Collaborator

Summary

Bundles one CRITICAL and three HIGH fixes spanning @workglow/tasks, @workglow/job-queue, and the SQLite provider.

C1 — Guard FetchUrl outer catches against permanent codes [SECURITY]

packages/tasks/src/task/FetchUrlTask.ts:144-150 and :258-263

Security regression. The outer catches in fetchWithProgress and FetchUrlJob.execute were unconditionally rewrapping every non-Abort error via wrapFetchUrlNetworkError. safeFetch throws permanent FetchUrlJobErrors with codes PRIVATE_DENIED, SCOPE_DENIED, INVALID_URL, DNS_FAILED, REDIRECT_MISSING_LOCATION, TOO_MANY_REDIRECTS, NO_RESPONSE_BODY, CONFIGURATION — these were getting rewrapped to retryable NETWORK_ERROR. An SSRF deny would burn the full retry budget instead of failing fast, amplifying any compromised-URL exposure and pinging the denied target up to 10 times per submission. The response-parse catch at :312-315 already had the correct guard; this PR mirrors it on both outer catches.

Fix: replace if (err instanceof Error && err.name === "AbortSignalJobError") throw with if (isFetchUrlJobError(err) || err instanceof AbortSignalJobError) throw (the instanceof form matches the parse catch; the import was already present). Tests cover each permanent code plus a positive control that real TypeError("ECONNREFUSED") still wraps to retryable NETWORK_ERROR.

H2 — Error-code reconstructor registry

packages/job-queue/src/job/JobErrorRegistry.ts (new), JobQueueClient.ts:617-628, packages/tasks/src/common.ts, packages/tasks/src/task/FetchUrlJobError.ts, packages/job-queue/README.md

JobQueueClient.buildErrorFromCode had a hardcoded FETCH_* block that knew which fetch codes were retryable, coupling @workglow/job-queue to @workglow/tasks and preventing downstream packages (LLM, file IO, etc.) from extending the typed-error round-trip the same way. Replaced with a module-level prefix→reconstructor registry. @workglow/tasks registers buildFetchUrlError for the FETCH_ prefix as a common.ts import side-effect. Built-in codes (PermanentJobError, RetryableJobError, AbortSignalJobError, JobDisabledError) stay in the existing switch — registry is purely additive.

Contract documented in packages/job-queue/README.md: prefix-matched, first-registered-wins, idempotent on identical fn, warn-and-overwrite on conflict. buildFetchUrlError falls back to PermanentJobError (with console.warn) for unknown future FETCH_* codes so older clients stay forward-compatible with newer workers.

H1 — v4 SQLite migration to converge max_attempts default

providers/sqlite/src/migrations/sqliteQueueMigrations.ts

The v3 in-place rebuild that converged max_attempts DEFAULT 23 → 10 ran inside v3's up(). Deployments that already applied the pre-fix v3 have v3 marked applied in _storage_migrations and will never re-run it — their DBs stay at DEFAULT 23 indefinitely while Postgres lands at DEFAULT 10. Callers omitting maxAttempts get divergent retry behavior across backends.

Fix: factor the rebuild into a shared helper, append v4 that idempotently re-runs the same logic. Both v3 and v4 guard with tableSql.includes("max_attempts INTEGER DEFAULT 10") so fresh installs land at 10 inside v3 and v4 no-ops; pre-fix-v3 DBs are repaired by v4.

H3 — SQLite v4 rebuild safety (FK precheck, trigger awareness)

Same sqliteQueueMigrations.ts helper.

The rebuild now (1) snapshots user-defined triggers/views referencing the queue table before the DROP and replays them after the RENAME so deployments that attach automation to the queue survive the rebuild, (2) captures and restores the caller's PRAGMA foreign_keys value, and (3) runs PRAGMA foreign_key_check after the rebuild so any dangling FK references throw and roll back. See "Open questions" below re: PRAGMA-inside-txn limitation.

Test plan

Validated locally with bun scripts/test.ts bun unit task queue and bun scripts/test.ts bun integration storage queue.

  • All existing tests pass (1263 unit task+queue, 1212 integration storage+queue)
  • bun test packages/test/src/test/task/FetchTask.test.ts — 36 pass, 0 fail (9 C1 pass-through tests, 1 positive control, 2 H2 regression tests)
  • bun test packages/test/src/test/job-queue/JobErrorRegistry.test.ts — 9 pass, 0 fail
  • bun test packages/test/src/test/storage-migrations/queueMigrationsParity.integration.test.ts — 9 pass, 0 fail (5 new H1/H3 tests)
  • bun run build:types — clean
  • bun run format — clean

Open questions for reviewer

  1. PRAGMA foreign_keys inside a txn. SqliteMigrationRunner wraps every up() in BEGIN/COMMIT. PRAGMA foreign_keys = OFF is a documented no-op inside an active transaction, so the SQLite docs' canonical 12-step procedure cannot be followed literally here. The queue table has no inbound FKs in the canonical schema, but user-defined FKs to it would not be silenced by our pragma. We fail the migration via foreign_key_check post-validation instead. Is that acceptable, or should the runner expose a withForeignKeysOff() escape hatch that suspends the txn for this case?

  2. Migration ledger access. The test fabricates a pre-fix-v3 DB by hand-inserting rows into _storage_migrations. This couples tests to the runner's table-name constant — an exported MIGRATIONS_TABLE is already imported from @workglow/storage, but tests inline the string for clarity. Worth standardizing?

  3. Prefix-match semantics. Registry v1 is first-registered-wins on prefix conflicts. Longest-prefix-wins would let a FETCH_HTTP_ reconstructor preempt a more general FETCH_ one — happy to switch if the design intent is hierarchical taxonomies. Currently the conflict is documented and a warning is logged on overwrite.

  4. Registration ergonomics. Side-effect registration in packages/tasks/src/common.ts is consistent with the existing import "./task/adaptive" pattern but is invisible to tree-shakers — if a task user only imports a single task they still pull the FETCH_ reconstructor. Acceptable given the tiny code size?

  5. Signing. Commits in this PR are unsigned because the code-sign environment-runner returned HTTP 400 "missing source" for every signing attempt (see request IDs req_011CbDXEr2gNMd5usCxmPgEq, req_011CbDXJCXnsSzcy2ctLatoF in the diagnostics). This appears to be an infrastructure issue with the signing server rather than something fixable from the worktree.


Generated by Claude Code

claude added 3 commits May 20, 2026 08:43
The outer catches in fetchWithProgress and FetchUrlJob.execute were
unconditionally rewrapping non-Abort errors via wrapFetchUrlNetworkError,
converting permanent SSRF/configuration errors thrown by safeFetch
(PRIVATE_DENIED, SCOPE_DENIED, INVALID_URL, DNS_FAILED,
REDIRECT_MISSING_LOCATION, TOO_MANY_REDIRECTS, NO_RESPONSE_BODY,
CONFIGURATION) into NETWORK_ERROR — which is retryable. A permanent
SSRF deny would therefore burn the full retry budget instead of failing
fast, amplifying any compromised-URL exposure.

Mirror the guard the response-parse catch already uses
(isFetchUrlJobError || instanceof AbortSignalJobError) so FETCH_* codes
and AbortSignalJobError pass through unchanged. Genuine network
TypeErrors still rewrap to a retryable NETWORK_ERROR.

Adds one regression test per permanent code asserting the original
code/class survives, plus a positive control that ECONNREFUSED still
wraps to NETWORK_ERROR.

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC
JobQueueClient.buildErrorFromCode had a hardcoded FETCH_* prefix block
that knew which fetch codes are retryable. This coupled the queue
package to @workglow/tasks and made it impossible for downstream
packages (LLM, file IO, etc.) to extend the typed-error round-trip the
same way.

Replace the FETCH_* hardcoded branch with a module-level prefix
registry. Packages register their reconstructor as an import-time side
effect:

  registerErrorCodeReconstructor("FETCH_", buildFetchUrlError);

JobQueueClient now consults the registry first; built-in codes
(PermanentJobError, RetryableJobError, AbortSignalJobError,
JobDisabledError) stay in the existing switch. Unknown codes fall back
to JobError as before.

Contract (documented in packages/job-queue/README.md):
- Prefix-matched, first-registered-wins on conflicts (logs a warning
  when a *different* fn overwrites an existing prefix; identical fn is
  a silent no-op).
- Reconstructors MUST NOT throw — buildFetchUrlError falls back to
  PermanentJobError on unknown future FETCH_* codes so older clients
  stay forward-compatible with newer workers.
- clearErrorCodeReconstructors() exposed for test hygiene.

Includes JobErrorRegistry.test.ts (registry semantics + queued-job
round-trip) and two regression tests in FetchTask.test.ts asserting
FETCH_HTTP_RATE_LIMITED reconstructs as RetryableJobError and
FETCH_PRIVATE_DENIED as PermanentJobError.

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC
H1 — v4 max_attempts convergence
================================
The v3 in-place rebuild that converged `max_attempts DEFAULT 23 → 10`
was gated inside v3's `up()`. Deployments that had already run the
pre-fix v3 have v3 marked applied in `_storage_migrations` and would
never re-run it — their DBs stay at DEFAULT 23 indefinitely while
Postgres has DEFAULT 10. Callers omitting `maxAttempts` get divergent
retry behavior across backends.

Append v4 which idempotently re-runs the same rebuild. v3 also delegates
to the shared helper so fresh installs land at DEFAULT 10 inside v3 and
v4 then no-ops (guarded by an explicit
`tableSql.includes("max_attempts INTEGER DEFAULT 10")` check).

H3 — rebuild safety
===================
The rebuild now snapshots user-defined triggers/views referencing the
queue table before DROP and replays them after RENAME so deployments
that attach automation to the queue survive the rebuild.

Foreign-key handling: the migration runner wraps every `up()` in
BEGIN/COMMIT, and `PRAGMA foreign_keys = OFF;` is a no-op inside an
active transaction. We therefore document this limitation and rely on
the `PRAGMA foreign_key_check` post-validation — if the drop+rename
leaves dangling FK references the migration throws and rolls back.
The prior `PRAGMA foreign_keys` value is captured and restored either
way so a caller's enforcement setting is preserved.

Tests (queueMigrationsParity.integration.test.ts):
- fabricates a pre-fix-v3-shaped DB (v1+v2 + manual renames, v3 marked
  applied in the ledger) and asserts v4 repairs it to DEFAULT 10
- v4 is a no-op on a converged fresh-install DB
- existing rows preserved across v4 rebuild; new inserts get
  max_attempts=10 by default
- user-defined trigger on the queue table survives v4 and still fires
- caller's foreign_keys=ON pragma is restored after v4

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 20, 2026

Open in StackBlitz

@workglow/cli

npm i https://pkg.pr.new/@workglow/cli@518

@workglow/ai

npm i https://pkg.pr.new/@workglow/ai@518

@workglow/browser-control

npm i https://pkg.pr.new/@workglow/browser-control@518

@workglow/indexeddb

npm i https://pkg.pr.new/@workglow/indexeddb@518

@workglow/javascript

npm i https://pkg.pr.new/@workglow/javascript@518

@workglow/job-queue

npm i https://pkg.pr.new/@workglow/job-queue@518

@workglow/knowledge-base

npm i https://pkg.pr.new/@workglow/knowledge-base@518

@workglow/mcp

npm i https://pkg.pr.new/@workglow/mcp@518

@workglow/storage

npm i https://pkg.pr.new/@workglow/storage@518

@workglow/task-graph

npm i https://pkg.pr.new/@workglow/task-graph@518

@workglow/tasks

npm i https://pkg.pr.new/@workglow/tasks@518

@workglow/util

npm i https://pkg.pr.new/@workglow/util@518

workglow

npm i https://pkg.pr.new/workglow@518

@workglow/anthropic

npm i https://pkg.pr.new/@workglow/anthropic@518

@workglow/bun-webview

npm i https://pkg.pr.new/@workglow/bun-webview@518

@workglow/chrome-ai

npm i https://pkg.pr.new/@workglow/chrome-ai@518

@workglow/electron

npm i https://pkg.pr.new/@workglow/electron@518

@workglow/google-gemini

npm i https://pkg.pr.new/@workglow/google-gemini@518

@workglow/huggingface-inference

npm i https://pkg.pr.new/@workglow/huggingface-inference@518

@workglow/huggingface-transformers

npm i https://pkg.pr.new/@workglow/huggingface-transformers@518

@workglow/node-llama-cpp

npm i https://pkg.pr.new/@workglow/node-llama-cpp@518

@workglow/ollama

npm i https://pkg.pr.new/@workglow/ollama@518

@workglow/openai

npm i https://pkg.pr.new/@workglow/openai@518

@workglow/playwright

npm i https://pkg.pr.new/@workglow/playwright@518

@workglow/postgres

npm i https://pkg.pr.new/@workglow/postgres@518

@workglow/sqlite

npm i https://pkg.pr.new/@workglow/sqlite@518

@workglow/supabase

npm i https://pkg.pr.new/@workglow/supabase@518

@workglow/tf-mediapipe

npm i https://pkg.pr.new/@workglow/tf-mediapipe@518

commit: 47521f2

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 62.12% 22722 / 36574
🔵 Statements 62% 23510 / 37918
🔵 Functions 63.16% 4293 / 6797
🔵 Branches 50.65% 10989 / 21695
File CoverageNo changed files found.
Generated in workflow #2320 for commit 47521f2 by the Vitest Coverage Report Action

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses error-handling correctness and cross-package decoupling for job error reconstruction, and adds a new SQLite migration to converge queue retry defaults and make the rebuild procedure safer.

Changes:

  • Prevent FetchUrlTask/FetchUrlJob outer catches from rewrapping permanent FETCH_* errors into retryable NETWORK_ERROR.
  • Introduce an error-code reconstructor registry in @workglow/job-queue and register FETCH_ reconstruction from @workglow/tasks.
  • Add SQLite queue migration v4 to converge max_attempts DEFAULT to 10 for already-applied pre-fix v3 DBs, with additional rebuild safety checks and expanded tests/docs.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
providers/sqlite/src/migrations/sqliteQueueMigrations.ts Adds rebuild helper + v4 migration to converge max_attempts default and preserve triggers / restore FK pragma / run FK check.
packages/tasks/src/task/FetchUrlTask.ts Guards outer catches so permanent FetchUrlJobErrors pass through unmodified.
packages/tasks/src/task/FetchUrlJobError.ts Adds buildFetchUrlError adapter for the new job-queue reconstructor registry (with forward-compat fallback).
packages/tasks/src/common.ts Registers FETCH_ reconstructor via side-effect on package import.
packages/job-queue/src/job/JobErrorRegistry.ts New prefix→reconstructor registry API (register/lookup/unregister/clear).
packages/job-queue/src/job/JobQueueClient.ts Uses registry lookup to reconstruct domain errors from persisted codes instead of hardcoded FETCH_ logic.
packages/job-queue/src/common.ts Re-exports the new registry API.
packages/job-queue/README.md Documents the custom error-code reconstruction mechanism and contract.
packages/test/src/test/task/FetchTask.test.ts Adds regression tests for permanent FETCH_* pass-through and client reconstruction behavior.
packages/test/src/test/job-queue/JobErrorRegistry.test.ts New unit tests covering registry semantics and an end-to-end roundtrip.
packages/test/src/test/storage-migrations/queueMigrationsParity.integration.test.ts Adds integration tests for v4 convergence, data preservation, trigger preservation, and FK pragma restoration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +618 to +622
protected buildErrorFromCode(message: string, errorCode?: string): JobError {
// FetchUrlTask codes — keep retryable set in sync with @workglow/tasks FetchUrlErrorCode
if (errorCode?.startsWith("FETCH_")) {
const retryable =
errorCode === "FETCH_HTTP_RATE_LIMITED" ||
errorCode === "FETCH_HTTP_SERVER_ERROR" ||
errorCode === "FETCH_NETWORK_ERROR";
const err = retryable ? new RetryableJobError(message) : new PermanentJobError(message);
err.code = errorCode;
applyPersistedDiagnosticsToStack(err, message);
return err;
// Domain-specific codes (e.g. FETCH_*, LLM_*, FILE_*) reconstruct via the
// {@link JobErrorRegistry}. Packages register their reconstructors on
// import so JobQueueClient stays decoupled from concrete error
// taxonomies. See `JobErrorRegistry.ts` for the contract.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3dbe032. Built-in error codes (PermanentJobError, RetryableJobError, AbortSignalJobError, JobDisabledError) now dispatch BEFORE the registry lookup, so a registered prefix can never intercept them (e.g. a "P" prefix is no longer able to shadow PermanentJobError). The reconstructor invocation is wrapped in try/catch — a throwing reconstructor logs a warning and falls through to a generic JobError with code preserved. After the reconstructor returns we also defensively re-set err.code = errorCode (warning on diverging codes, silent inject when unset) so downstream branching on err.code is reliable even with a buggy reconstructor. README contract section updated to document the precedence. New tests in JobErrorRegistry.test.ts: "registered 'P' prefix does not intercept PermanentJobError built-in", "reconstructor that throws falls through to JobError with code preserved", "reconstructor that returns error without code has code injected".


Generated by Claude Code

Comment on lines +108 to +112
.prepare<
[string],
SchemaObj
>("SELECT type, name, sql FROM sqlite_schema " + "WHERE type IN ('trigger','view') AND tbl_name = ? " + "AND name NOT LIKE 'sqlite_%'")
.all(tableName);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 47521f2. The "views are preserved" claim was indeed false: in sqlite_schema a view's tbl_name is the VIEW's own name, not the referenced table, so tbl_name = ? couldn't match views over the queue table. Rather than try to parse view SQL to find dependent views (fragile across quoting/whitespace/aliases/CTEs and no in-tree consumers rely on it), the snapshot query is narrowed to type = 'trigger', the local var is renamed userTriggers, and the surrounding comment + the v4 migration description now spell out the limitation: operators with views over the queue table must DROP them before this migration runs and re-create them after (also necessary because SQLite's drop+rename leaves any dependent view briefly invalid and breaks the txn). A negative test in queueMigrationsParity.integration.test.ts pins the documented behavior so anyone tempted to add view-preservation also updates the public docs.


Generated by Claude Code

Comment on lines +49 to +53
// Capture and restore the registry around each test so we never leak global
// state between tests in this file or to neighboring test files.
beforeEach(() => {
clearErrorCodeReconstructors();
});
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 003ad62. Added snapshotErrorCodeReconstructors() / restoreErrorCodeReconstructors(snapshot) to JobErrorRegistry. JobErrorRegistry.test.ts now snapshots in beforeEach, clear()s for a clean per-test slate, then restores in afterEach so any FETCH_* (or other) registrations made by sibling test files via ESM import side-effects survive this file's tests regardless of execution order. clearErrorCodeReconstructors is still exported but its docstring now flags the cross-file footgun and steers callers to the snapshot/restore pair. Verified by running bun test packages/test/src/test/job-queue/JobErrorRegistry.test.ts packages/test/src/test/task/FetchTask.test.ts in both orders — both files pass either way (48 tests, 0 fail).


Generated by Claude Code

claude added 3 commits May 20, 2026 16:11
…reconstructors

Registry lookup ran BEFORE the built-in error-code switch, so a
registered prefix could intercept core types (e.g. a `"P"` prefix
shadowing `PermanentJobError`). Reorder so built-ins dispatch first
and the registry is only consulted for unknown codes.

Defensive hardening of the reconstructor invocation:
  * Wrap the call in try/catch so a throwing reconstructor logs a
    warning and falls through to a generic `JobError` with `code`
    preserved instead of breaking job-result delivery.
  * After the reconstructor returns, override `err.code` to the
    persisted `errorCode` if mismatched (warning on diverging codes,
    silent inject when unset) so downstream branching on `err.code`
    is reliable even with buggy reconstructors.

README updated to document the precedence and `code`-injection
contract.

Tests (JobErrorRegistry.test.ts):
  * "registered 'P' prefix does not intercept PermanentJobError
    built-in" — asserts the reconstructor is never called for built-in
    codes.
  * "reconstructor that throws falls through to JobError with code
    preserved" — asserts the fallback path.
  * "reconstructor that returns error without code has code injected"
    — asserts the defensive code-set.

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC
`clearErrorCodeReconstructors()` in `JobErrorRegistry.test.ts`'s
`beforeEach`/`afterEach` permanently wiped the global registry for
the current Vitest worker. Other test files (e.g. `FetchTask.test.ts`)
register FETCH_* reconstructors via ESM module-load side effects —
because ESM caches modules, those registrations only run once per
worker, so this file's `clear()` could leave them gone for the rest
of the worker depending on file order.

Add two new exports to `JobErrorRegistry`:
  * `snapshotErrorCodeReconstructors()` — returns a defensive copy of
    the current registry entries.
  * `restoreErrorCodeReconstructors(snapshot)` — replaces the registry
    contents with the snapshot (in order).

Update `JobErrorRegistry.test.ts` to snapshot in `beforeEach`, clear
for a clean per-test slate, then restore in `afterEach` so sibling
test files' side-effect registrations survive this file's tests
regardless of execution order. `clearErrorCodeReconstructors` stays
exported but its docstring now flags the cross-file footgun and
points callers at the snapshot/restore pair.

Verified by running
  bun test packages/test/src/test/job-queue/JobErrorRegistry.test.ts \
            packages/test/src/test/task/FetchTask.test.ts
and the reverse order — both files pass either way.

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC
The v4 rebuild's schema snapshot ran
  SELECT type, name, sql FROM sqlite_schema
  WHERE type IN ('trigger','view') AND tbl_name = ?
to capture and replay user-defined triggers + views. But in
`sqlite_schema` a view's `tbl_name` is the VIEW's own name, not the
referenced table — so `tbl_name = ?` cannot locate views that
reference the queue table. The "views are preserved" claim never
held; views attached to the queue table were silently lost.

Parsing view SQL to find dependent tables is fragile (quoting,
whitespace, aliases, subselects, CTEs), there are zero in-tree
consumers that rely on view preservation, and SQLite's drop+rename
also leaves any dependent view briefly invalid (breaking the
migration txn). Drop the view claim and document the limitation
honestly.

Changes:
  * Narrow the snapshot query to `type = 'trigger'` (triggers DO use
    `tbl_name` correctly for their target table).
  * Rename `userObjects` → `userTriggers` for clarity.
  * Replace the comment with one that explains *why* views are not
    preserved.
  * Update the v4 migration `description` to call out the operator
    workflow: drop dependent views before this migration runs and
    re-create them afterwards.

Tests (queueMigrationsParity.integration.test.ts):
  * Existing "v4 preserves user-defined triggers on the queue table"
    test renamed to make the "trigger still fires" assertion explicit.
  * New negative test "v4 does NOT capture user-defined views over
    the queue table (documented limitation)" — pins the documented
    behavior by exercising the snapshot query directly on a DB with a
    queue-referencing view, then running the migration after the
    operator drops the view (the documented workflow).

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC
@sroussey sroussey self-assigned this May 20, 2026
@sroussey sroussey merged commit 4fa829d into main May 20, 2026
16 checks passed
@sroussey sroussey deleted the claude/libs-main-critical-high-fixes-Bi1rh branch May 20, 2026 17:25
sroussey pushed a commit that referenced this pull request May 20, 2026
Main shipped a SQLite v4 migration (max_attempts default convergence) in
PR #518 while this branch was open with its own SQLite v4 (UNIQUE
fingerprint index). After rebase the two collided. This commit:

- Renames our SQLite UNIQUE fingerprint index migration to v5 and the
  pre-edit convergence migration to v6 so it lands cleanly on top of
  main's v4.
- Updates the comment in SqliteQueueStorage.add()'s race-handling branch
  to reference v5.
- Updates JobStoreFingerprintRace.test.ts and queueMigrationsParity's v6
  convergence cases to the new numbering.
- Updates queueMigrationsParity's "v4 max_attempts convergence" cases to
  expect [4, 5, 6] / [1, 2, 3, 4, 5, 6] now that the chain has six
  migrations.

Postgres numbering is unaffected — main didn't bump Postgres past v3,
so our Postgres v4/v5 stay as is.
sroussey pushed a commit that referenced this pull request May 20, 2026
…ansports

Decompose the legacy IQueueStorage into IMessageQueue (transport) +
IJobStore (persistence) + IClaim (per-message lease). This lets cloud
transports (SQS, Cloudflare Queues — separate commits) plug in without
faking an IQueueStorage.

IJobStore gains six new methods so IClaim.ack/fail can be atomic on
backends without a shared pending-buffer:

- create(body, opts): insert PENDING row from JobStorageFormat + SendOptions
- findActiveByFingerprint(fingerprint, queueName): O(1) dedup lookup
- getMany(ids): batch read aligned to input order
- completeWithResult(id, result): atomic output + status=COMPLETED
- failWithError(id, opts): atomic error fields + status=FAILED
- markEnqueueDeferred(id, opts): keep row PENDING with visible_at deferred
  for transient enqueue failures (H4)

Implementations across InMemory, IndexedDb, SQLite, Postgres, Supabase,
and WrappedJobStore. SQL backends use native indexed queries.

H2 — fingerprint dedup race handling: SQLite v5 + Postgres v4 add a
UNIQUE partial index on (queue, fingerprint) WHERE status IN
('PENDING','PROCESSING'). Concurrent inserts race to the DB; the loser
catches 23505 / SQLITE_CONSTRAINT_UNIQUE and re-resolves via
findActiveByFingerprint, eliminating the prior TOCTOU window. SQLite v6
+ Postgres v5 are defensive convergence migrations for any DB that
recorded a non-unique variant. SQLite numbering lands on top of main's
v4 (max_attempts default convergence from #518).

Worker refactor: extract public JobQueueWorker.processClaims(claims) so
external transports (CFQ's push-only queue() handler) can drive a
pre-fetched batch through the worker's existing limiter + dispatch +
lease pipeline. Preserves shutdown-mid-batch release, per-claim error
isolation, and limiter back-off. C1: awaits all dispatched per-claim
work before returning so callers can rely on settlement.

JobQueueClient.send now forwards SendOptions to messageQueue.send so
fingerprint / delaySeconds / timeoutSeconds actually round-trip to
cloud adapters.

H3 — IMessageQueue.sendBatch rejects opts.fingerprint (every body would
otherwise dedup against the first row). Enforced uniformly across
WrappedMessageQueue, SqsMessageQueue, and CloudflareMessageQueue.

Also: preserve caller-provided fingerprint across all *QueueStorage.add()
paths (was being clobbered by makeFingerprint); widen
TelemetryQueueStorage.finalize to match the interface signature evolution;
log unexpected processSingleJob rejections; extend IQueueStorage.finalize
to support visible_at (needed by markEnqueueDeferred).

Includes a parameterized JobStoreExtensions contract suite, a worker
processClaims unit test, a fingerprint race test, and Postgres /
Supabase extension integration test files (gated by env vars).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants