Skip to content

fix(job-queue): follow-up correctness fixes to PR #511#513

Merged
sroussey merged 7 commits into
mainfrom
claude/sweet-edison-6tXoV
May 19, 2026
Merged

fix(job-queue): follow-up correctness fixes to PR #511#513
sroussey merged 7 commits into
mainfrom
claude/sweet-edison-6tXoV

Conversation

@sroussey
Copy link
Copy Markdown
Collaborator

Four follow-up correctness fixes for PR #511 (refactor(job-queue): drop legacy limiter methods, fix QueuedExecutionStrategy release, rename releaseClaim). Each fix is a separate commit so they can be reviewed and reverted independently.

Commit 1 — fix(job-queue): RateLimiter slot leak, disableJob dispatch, pre-execute abort check

Three regressions in JobQueueWorker.processSingleJob that PR #511's commit message claimed to fix but that did not land in the code:

  1. RateLimiter slot leak (CRITICAL). validateJobState() failures now call limiter.release() (not complete()) before rethrow, and the outer finally is gated on a limiterReleased flag so the slot is not double-decremented. Without this, every DEADLINE-EXCEEDED or pre-aborted job permanently consumed a RateLimiter window slot (complete() is a no-op for RateLimiter; the slot only ages out of the window naturally — but the slot was never released).
  2. disableJob dispatch (HIGH). JobDisabledError now routes through disableJob() instead of falling into the generic failJob() branch. The H5 atomic-disable code path was previously unreachable; disabling a job mid-flight clobbered the DISABLED status with FAILED.
  3. Pre-execute abort check (HIGH). createAbortController(job.id) is now created BEFORE validateJobState() so the activeJobAbortControllers.get(job.id)?.signal.aborted check at the top of validateJobState is reachable. Previously the controller was created after validation, making the branch dead code.

Tests: packages/test/src/test/job-queue/JobQueueWorker.test.ts (new file, three regressions, one per fix).

Commit 2 — fix(job-queue,indexeddb): abort(PENDING) attempts bump + v2 migration backfill

  1. IndexedDbQueueStorage.abort(PENDING) (CRITICAL). Replaced this.complete(job) with a direct put() that sets status=FAILED, abort_requested_at, completed_at WITHOUT bumping attempts. Matches the cross-backend contract verified in InMemoryQueueStorage/PostgresQueueStorage.
  2. IndexedDB v2 migration (CRITICAL). v2 previously copied only run_after → visible_at, leaving run_attempts, last_ran_at, max_retries, worker_id orphaned on upgrade. The storage layer reads the post-rename names, so existing browser-deployed queues silently lost retry budgets, last-attempt timestamps, and lease ownership on first migration. Extend the cursor body to migrate all five renames idempotently and move the queue_status_visible_at createIndex into the terminal cursor === null branch so the index is built off post-migration rows.

Tests: new it("v2 migrates all five legacy field renames", ...) in packages/test/src/test/storage-migrations/IndexedDbQueueMigrations.integration.test.ts; new it("abort(PENDING) does not bump attempts (cross-backend contract)", ...) inside the existing H1+H4 describe block in packages/test/src/test/job-queue/genericJobQueueTests.ts (runs against every backend).

Commit 3 — fix(sqlite): v3 max_attempts default = 10 to match Postgres parity

PR #511 lowered the default retry budget from 20 (Postgres) / 23 (SQLite v1) to 10. Postgres v3 explicitly applied ALTER COLUMN max_attempts SET DEFAULT 10; SQLite v3 renamed the column but did not adjust the default. Fresh SQLite installs ended up at default 23, Postgres at 10 — callers omitting maxAttempts got divergent retry behavior across backends.

SQLite has no ALTER COLUMN ... SET DEFAULT syntax, so the fix uses the documented 12-step table-rebuild procedure: build the new CREATE TABLE statement from the post-rename PRAGMA table_info, swap max_attempts's default literal, copy rows, drop old, rename new, recreate indexes. Gated on the current default not already being '10' so re-running v3 is idempotent.

Tests: extended packages/test/src/test/storage-migrations/queueMigrationsParity.integration.test.ts with a new describe("queue migrations: cross-backend default parity", ...) block that asserts max_attempts default === 10 and attempts default === 0 on both Postgres (information_schema.columns) and SQLite (PRAGMA table_info).

Commit 4 — fix(job-queue): validate leaseMs / extendLease ms inputs across all backends

PR #511 added Number.isFinite guards to Supabase only; Postgres, SQLite, InMemoryQueueStorage, and IndexedDB passed leaseMs / ms directly into new Date(Date.now() + ms).toISOString() (yields "Invalid Date" for NaN/Infinity, poisoning lease_expires_at) or into parameterized SQL fragments (runtime error for non-finite). A negative leaseMs immediately re-expired the lease a worker just claimed.

New shared helper validateLeaseMs() at packages/job-queue/src/queue-storage/validateLeaseMs.ts, exported from @workglow/job-queue. Called at the top of every next() and extendLease() across all 5 backends. Supabase's inline Error throws migrated to the shared RangeError so all backends report the same exception type. ms === 0 remains valid (instant expiry).

Tests: new describe("leaseMs / extendLease input validation (PR #511 follow-up)", ...) block inside packages/test/src/test/job-queue/genericJobQueueTests.ts — runs against every backend, covers negative/NaN/Infinity rejections on next() and extendLease() plus the leaseMs: 0 accept case.

Verification

  • bun run build:types — clean (30 tasks)
  • bun scripts/test.ts queue vitest — 398 passed, 6 skipped
  • bun scripts/test.ts storage vitest — 1226 passed, 13 skipped

Test files added / modified

  • new: packages/test/src/test/job-queue/JobQueueWorker.test.ts
  • new: packages/job-queue/src/queue-storage/validateLeaseMs.ts
  • modified: packages/test/src/test/job-queue/genericJobQueueTests.ts (cross-backend regressions)
  • modified: packages/test/src/test/storage-migrations/IndexedDbQueueMigrations.integration.test.ts
  • modified: packages/test/src/test/storage-migrations/queueMigrationsParity.integration.test.ts

Reference: PR #511 (#511) — 69c1bd1.


Generated by Claude Code

claude added 4 commits May 19, 2026 08:31
…te abort check

Three follow-up fixes to PR #511 that the commit message claimed but
that did not land in the code:

1. RateLimiter slot leak (CRITICAL): validateJobState() failures
   now call limiter.release() (not complete()) before rethrow, and
   the outer finally is gated on a limiterReleased flag so the slot
   is not double-decremented. Without this, every DEADLINE-EXCEEDED
   or pre-aborted job permanently consumed a RateLimiter window slot.

2. disableJob dispatch (HIGH): JobDisabledError now routes through
   disableJob() instead of falling into failJob(), so attempting to
   disable a job no longer clobbers the DISABLED status with FAILED.
   The H5 atomic-disable code path is now reachable.

3. Pre-execute abort check (HIGH): createAbortController(job.id) is
   moved before validateJobState() so the pre-execute abort flag
   check at the top of validateJobState (which reads
   activeJobAbortControllers) is reachable. Previously the controller
   was created after validation, making the branch dead code.

Adds JobQueueWorker.test.ts with regressions for all three.
… backfill

Two follow-up fixes to PR #511:

1. IndexedDbQueueStorage.abort(PENDING) (CRITICAL): replaced
   `this.complete(job)` with a direct `put()` that sets status=FAILED,
   abort_requested_at, completed_at WITHOUT bumping attempts. Matches
   the cross-backend contract verified in
   InMemoryQueueStorage/PostgresQueueStorage and asserted in a new
   genericJobQueueTests case.

2. IndexedDB v2 migration (CRITICAL): v2 previously copied only
   run_after → visible_at, leaving run_attempts, last_ran_at,
   max_retries, worker_id orphaned on upgrade. The storage layer
   reads the post-rename names, so existing browser-deployed queues
   silently lost retry budgets, last-attempt timestamps, and lease
   ownership on first migration. Extend the cursor body to migrate
   all five renames idempotently. Move the queue_status_visible_at
   createIndex into the terminal cursor branch so the index is built
   off post-migration rows.

Adds IndexedDbQueueMigrations.integration test for the five-rename
case and a cross-backend abort(PENDING) attempts-stability assertion.
PR #511 lowered the default retry budget from 20 (Postgres) / 23
(SQLite v1) to 10. Postgres v3 explicitly applied
`ALTER COLUMN max_attempts SET DEFAULT 10`; SQLite v3 renamed the
column but did not adjust the default. Fresh SQLite installs ended
up at default 23, Postgres at 10, so callers omitting maxAttempts
got divergent retry behavior across backends.

SQLite has no `ALTER COLUMN ... SET DEFAULT` syntax, so the fix uses
the documented 12-step table-rebuild procedure
(https://www.sqlite.org/lang_altertable.html#otheralter): build the
new CREATE TABLE statement from the post-rename PRAGMA table_info,
swap max_attempts's default literal, copy rows, drop old, rename
new, recreate indexes. The rebuild is gated on the current default
not already being '10' so re-running v3 is idempotent.

Extend the migrations parity integration test to compare DEFAULTS,
not just column names, so future drift is caught.
…ackends

PR #511 added Number.isFinite guards to Supabase only; Postgres,
SQLite, InMemoryQueueStorage, and IndexedDB passed leaseMs / ms
directly into new Date(Date.now() + ms).toISOString() (yields
"Invalid Date" for NaN/Infinity, poisoning the row) or into
parameterized SQL fragments (runtime error for non-finite). A
negative leaseMs immediately re-expires the lease a worker just
claimed.

Extract validation into a shared validateLeaseMs() helper in
@workglow/job-queue; call it at the top of every next() and
extendLease() across all 5 backends. Migrate Supabase from its
inline Error throw to the shared RangeError so all backends report
the same exception type. ms === 0 remains valid (instant expiry).

Adds a cross-backend contract test that runs against every backend
via the shared genericJobQueueTests harness.
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 19, 2026

Open in StackBlitz

@workglow/cli

npm i https://pkg.pr.new/@workglow/cli@513

@workglow/ai

npm i https://pkg.pr.new/@workglow/ai@513

@workglow/browser-control

npm i https://pkg.pr.new/@workglow/browser-control@513

@workglow/indexeddb

npm i https://pkg.pr.new/@workglow/indexeddb@513

@workglow/javascript

npm i https://pkg.pr.new/@workglow/javascript@513

@workglow/job-queue

npm i https://pkg.pr.new/@workglow/job-queue@513

@workglow/knowledge-base

npm i https://pkg.pr.new/@workglow/knowledge-base@513

@workglow/mcp

npm i https://pkg.pr.new/@workglow/mcp@513

@workglow/storage

npm i https://pkg.pr.new/@workglow/storage@513

@workglow/task-graph

npm i https://pkg.pr.new/@workglow/task-graph@513

@workglow/tasks

npm i https://pkg.pr.new/@workglow/tasks@513

@workglow/util

npm i https://pkg.pr.new/@workglow/util@513

workglow

npm i https://pkg.pr.new/workglow@513

@workglow/anthropic

npm i https://pkg.pr.new/@workglow/anthropic@513

@workglow/bun-webview

npm i https://pkg.pr.new/@workglow/bun-webview@513

@workglow/chrome-ai

npm i https://pkg.pr.new/@workglow/chrome-ai@513

@workglow/electron

npm i https://pkg.pr.new/@workglow/electron@513

@workglow/google-gemini

npm i https://pkg.pr.new/@workglow/google-gemini@513

@workglow/huggingface-inference

npm i https://pkg.pr.new/@workglow/huggingface-inference@513

@workglow/huggingface-transformers

npm i https://pkg.pr.new/@workglow/huggingface-transformers@513

@workglow/node-llama-cpp

npm i https://pkg.pr.new/@workglow/node-llama-cpp@513

@workglow/ollama

npm i https://pkg.pr.new/@workglow/ollama@513

@workglow/openai

npm i https://pkg.pr.new/@workglow/openai@513

@workglow/playwright

npm i https://pkg.pr.new/@workglow/playwright@513

@workglow/postgres

npm i https://pkg.pr.new/@workglow/postgres@513

@workglow/sqlite

npm i https://pkg.pr.new/@workglow/sqlite@513

@workglow/supabase

npm i https://pkg.pr.new/@workglow/supabase@513

@workglow/tf-mediapipe

npm i https://pkg.pr.new/@workglow/tf-mediapipe@513

commit: 7eebbda

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 61.99% 22600 / 36454
🔵 Statements 61.87% 23384 / 37795
🔵 Functions 62.98% 4266 / 6773
🔵 Branches 50.53% 10928 / 21625
File CoverageNo changed files found.
Generated in workflow #2298 for commit 7eebbda by the Vitest Coverage Report Action

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Four follow-up correctness fixes for PR #511's job-queue refactor: a RateLimiter slot leak plus two unreachable dispatch branches in JobQueueWorker.processSingleJob, an IndexedDB v2 backfill that previously dropped four of five renamed fields, an IndexedDB abort(PENDING) that incorrectly bumped attempts, a SQLite v3 max_attempts default mismatch with Postgres, and missing leaseMs/extendLease input validation across non-Supabase backends consolidated into a shared validateLeaseMs helper.

Changes:

  • JobQueueWorker.processSingleJob: release limiter slot on validation failure (gated by limiterReleased), route JobDisabledError through disableJob, register abort controller before validateJobState.
  • IndexedDB v2 migration: backfill all five legacy field renames in one cursor pass and defer queue_status_visible_at index creation; IndexedDbQueueStorage.abort(PENDING) writes directly via put() instead of complete() to avoid bumping attempts.
  • SQLite v3 migration: rebuild table via 12-step procedure to lower max_attempts default 23→10 for Postgres parity; new shared validateLeaseMs (throws RangeError) wired into Postgres/SQLite/InMemory/IndexedDB/Supabase next() and extendLease().

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/job-queue/src/job/JobQueueWorker.ts Three regressions fixed in processSingleJob: limiter release on validation failure, JobDisabledError dispatch, pre-execute abort controller registration.
packages/job-queue/src/queue-storage/validateLeaseMs.ts New shared helper throwing RangeError for non-finite/negative ms.
packages/job-queue/src/common.ts Re-export the new helper.
packages/job-queue/src/queue-storage/InMemoryQueueStorage.ts Call validateLeaseMs in next / extendLease.
packages/indexeddb/src/job-queue/IndexedDbQueueStorage.ts Validate lease ms inputs; abort(PENDING) uses put() to avoid bumping attempts.
packages/indexeddb/src/migrations/indexedDbQueueMigrations.ts v2 migrates all five legacy renames; index creation moved into terminal cursor branch.
providers/postgres/src/job-queue/PostgresQueueStorage.ts Validate lease ms inputs.
providers/sqlite/src/job-queue/SqliteQueueStorage.ts Validate lease ms inputs.
providers/sqlite/src/migrations/sqliteQueueMigrations.ts v3 rebuilds table to set max_attempts DEFAULT 10 for Postgres parity.
providers/supabase/src/job-queue/SupabaseQueueStorage.ts Replace inline Error throws with shared validateLeaseMs.
packages/test/src/test/job-queue/JobQueueWorker.test.ts New regression tests for the three processSingleJob fixes.
packages/test/src/test/job-queue/genericJobQueueTests.ts Cross-backend lease-ms validation tests and abort(PENDING) no-bump contract test.
packages/test/src/test/storage-migrations/IndexedDbQueueMigrations.integration.test.ts Verifies all five legacy renames are migrated.
packages/test/src/test/storage-migrations/queueMigrationsParity.integration.test.ts Cross-backend defaults parity test (max_attempts=10, attempts=0).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +195 to +233
it("pre-execute abort flag is observed during validateJobState", async () => {
// Before the fix, createAbortController() ran AFTER validateJobState,
// so the activeJobAbortControllers.get(...).signal.aborted branch in
// validateJobState was dead code. With the controller registered
// first, an abort fired before start() / right at start time is
// observable and the job fails with AbortSignalJobError +
// abort_requested_at set.
const server = new JobQueueServer<TI, TO, TJob>(TJob, {
storage: storage as any,
queueName,
pollIntervalMs: 5,
stopTimeoutMs: 0,
});
const client = new JobQueueClient<TI, TO>({ storage: storage as any, queueName });
client.attach(server);

const handle = await client.send({ taskType: "long_running", data: "pre-abort" });
// Abort before any worker has claimed the row — this PENDING-abort
// path sets the row to FAILED with abort_requested_at set immediately.
await storage.abort(handle.id);

await server.start();

const reached = await waitUntil(async () => {
const j = await storage.get(handle.id);
return (
j?.status === JobStatus.FAILED ||
j?.status === JobStatus.COMPLETED ||
j?.status === JobStatus.DISABLED
);
});
expect(reached).toBe(true);

await server.stop();

const final = await storage.get(handle.id);
expect(final?.status).toBe(JobStatus.FAILED);
expect(final?.abort_requested_at).toBeTruthy();
});
Comment on lines +133 to +173
const postRenameInfos: ColInfo[] = db
.prepare<[], ColInfo>(`PRAGMA table_info(${tableName})`)
.all();
const maxAttemptsCol = postRenameInfos.find((c) => c.name === "max_attempts");
if (maxAttemptsCol && maxAttemptsCol.dflt_value !== "10") {
// Build a new CREATE TABLE statement from the post-rename
// table_info, swapping max_attempts's default. Preserving the
// existing types / NOT NULL / PK / other defaults keeps the
// rebuild a true no-op for every other column.
const columnDefs = postRenameInfos
.map((c) => {
const parts: string[] = [c.name, c.type || ""];
if (c.pk) {
parts.push("PRIMARY KEY");
}
if (c.notnull) {
parts.push("NOT NULL");
}
const dflt =
c.name === "max_attempts" ? "10" : c.dflt_value !== null ? c.dflt_value : null;
if (dflt !== null) {
parts.push(`DEFAULT ${dflt}`);
}
return parts.filter((p) => p.length > 0).join(" ");
})
.join(",\n ");
const colList = postRenameInfos.map((c) => c.name).join(", ");
const newTable = `${tableName}__new_v3`;
db.exec(`
CREATE TABLE ${newTable} (
${columnDefs}
);
INSERT INTO ${newTable} (${colList}) SELECT ${colList} FROM ${tableName};
DROP TABLE ${tableName};
ALTER TABLE ${newTable} RENAME TO ${tableName};

CREATE INDEX IF NOT EXISTS job_queue_fetcher${indexSuffix}_idx ON ${tableName} (${prefixIndexPrefix}queue, status, visible_at);
CREATE INDEX IF NOT EXISTS job_queue_fingerprint${indexSuffix}_idx ON ${tableName} (${prefixIndexPrefix}queue, fingerprint, status);
CREATE INDEX IF NOT EXISTS job_queue_job_run_id${indexSuffix}_idx ON ${tableName} (${prefixIndexPrefix}queue, job_run_id);
`);
}
@sroussey
Copy link
Copy Markdown
Collaborator Author

@copilot apply changes based on the comments in this thread

Copilot AI and others added 3 commits May 19, 2026 15:45
Copy link
Copy Markdown
Contributor

Copilot AI commented May 19, 2026

@copilot apply changes based on the comments in this thread

Applied the thread follow-ups in 7eebbda. The worker regression test now aborts the active controller before validation so it actually proves the pre-execute path, and the SQLite v3 rebuild now uses a shared canonical CREATE TABLE helper instead of reconstructing DDL from PRAGMA metadata.

@sroussey sroussey merged commit d452cd6 into main May 19, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants