fix(util/worker, ai/task): TTL-based pendingAborts eviction; clarify runWithIterable bond by sroussey · Pull Request #500 · workglow-dev/libs

sroussey · 2026-05-14T08:15:29Z

Why

WorkerServerBase.pendingAborts previously used FIFO eviction at size 1000 (dropping the oldest 500 in bulk once the cap was crossed). A misbehaving client spamming aborts for ids that never landed could silently evict a legitimate pending abort before its matching call arrived — the run-fn would then execute un-aborted, defeating the whole abort-before-call race fix.

Separately, runWithIterable's top JSDoc misleadingly suggested a two-way abort bond ("leaves the parent alone"). The implementation is actually parent → child only, and the wording invited future contributors to reintroduce reverse propagation by accident.

What

TTL-based pendingAborts (packages/util/src/worker/WorkerServerBase.ts):

pendingAborts is now Map<string, number> keyed on insertion timestamp instead of Set<string>.
30s TTL via PENDING_ABORT_TTL_MS. Each insertion schedules a one-shot setTimeout cleanup (mirrors the existing scheduleCompletedRequestCleanup pattern); timer handles are tracked in pendingAbortTimers so dispose-style hooks can clear them.
Inline TTL sweep on every recordPendingAbort — drops any entries older than the TTL.
10k hard cap (PENDING_ABORT_HARD_CAP) as a memory safety-net; when exceeded, evict the half with the lowest timestamps (oldest-by-timestamp, not insertion order).
consumePendingAbort rechecks the TTL on consumption: present + fresh → abort + delete + return true; present + expired → delete + return false; absent → return false.

Comment fix (packages/ai/src/task/base/runWithIterable.ts):

Replace the misleading paragraph with explicit one-way-bond description plus two @example blocks showing parent-cancel vs consumer-break semantics.
Add an inline // One-way bond: parent -> child only. marker above the addEventListener("abort", ...) line.

Tests (packages/test/src/test/util/WorkerServerBase.race.test.ts):

New describe("WorkerServerBase pendingAborts eviction policy", ...) block with three cases under vi.useFakeTimers():
- Eviction-correctness: 1500 distinct aborts in a single tick + matching call for a-500 → signal observed aborted (would have failed under FIFO).
- TTL-expiry: single abort + vi.advanceTimersByTime(31_000) + matching call → signal observed un-aborted.
- Hard-cap safety: HARD_CAP + 10 aborts with 1ms steps → pendingAborts.size <= HARD_CAP, newest id survives (consume succeeds), oldest id evicted (consume fails). Also asserts the behavioural path via the same run-fn registration shape as the existing tests.

Verification

bun test packages/test/src/test/util/WorkerServerBase.race.test.ts — runs the three new tests plus the three pre-existing abort-before-call race tests (regression coverage).
Type-check + lint on the two source files.

Risks

The 30s TTL must comfortably exceed worst-case message-port roundtrip latency between the main thread and the worker. In practice this is microseconds-to-milliseconds, so 30s is a 10,000x safety margin — not a concern.
consumePendingAbort now rechecks Date.now() - ts > TTL on every call (one extra subtraction). Negligible.
The per-id setTimeout adds one timer per abort. With the 10k hard cap that's at most 10k pending timers — well within Node/Bun limits, and they GC as they fire.

Rollback

Two independent commits, each rollback-able on its own:

fix(util/worker): TTL-based pendingAborts eviction — revert restores the FIFO Set policy and removes the new tests.
docs(ai/task): clarify runWithIterable abort bond is one-way — comment-only; revert restores the prior JSDoc verbatim.

Note: this PR is branched off main as claude/loving-mendel-Hcznj-abort rather than the default session branch claude/loving-mendel-Hcznj because a separate PR re-enabling CI is being authored on that branch in parallel. The two changes have independent review concerns and should be reviewable in isolation.

Generated by Claude Code

pkg-pr-new · 2026-05-14T08:17:14Z

Open in StackBlitz

@workglow/cli

npm i https://pkg.pr.new/@workglow/cli@500

@workglow/ai

npm i https://pkg.pr.new/@workglow/ai@500

@workglow/browser-control

npm i https://pkg.pr.new/@workglow/browser-control@500

@workglow/indexeddb

npm i https://pkg.pr.new/@workglow/indexeddb@500

@workglow/javascript

npm i https://pkg.pr.new/@workglow/javascript@500

@workglow/job-queue

npm i https://pkg.pr.new/@workglow/job-queue@500

@workglow/knowledge-base

npm i https://pkg.pr.new/@workglow/knowledge-base@500

@workglow/mcp

npm i https://pkg.pr.new/@workglow/mcp@500

@workglow/storage

npm i https://pkg.pr.new/@workglow/storage@500

@workglow/task-graph

npm i https://pkg.pr.new/@workglow/task-graph@500

@workglow/tasks

npm i https://pkg.pr.new/@workglow/tasks@500

@workglow/util

npm i https://pkg.pr.new/@workglow/util@500

workglow

npm i https://pkg.pr.new/workglow@500

@workglow/anthropic

npm i https://pkg.pr.new/@workglow/anthropic@500

@workglow/bun-webview

npm i https://pkg.pr.new/@workglow/bun-webview@500

@workglow/chrome-ai

npm i https://pkg.pr.new/@workglow/chrome-ai@500

@workglow/electron

npm i https://pkg.pr.new/@workglow/electron@500

@workglow/google-gemini

npm i https://pkg.pr.new/@workglow/google-gemini@500

@workglow/huggingface-inference

npm i https://pkg.pr.new/@workglow/huggingface-inference@500

@workglow/huggingface-transformers

npm i https://pkg.pr.new/@workglow/huggingface-transformers@500

@workglow/node-llama-cpp

npm i https://pkg.pr.new/@workglow/node-llama-cpp@500

@workglow/ollama

npm i https://pkg.pr.new/@workglow/ollama@500

@workglow/openai

npm i https://pkg.pr.new/@workglow/openai@500

@workglow/playwright

npm i https://pkg.pr.new/@workglow/playwright@500

@workglow/postgres

npm i https://pkg.pr.new/@workglow/postgres@500

@workglow/sqlite

npm i https://pkg.pr.new/@workglow/sqlite@500

@workglow/supabase

npm i https://pkg.pr.new/@workglow/supabase@500

@workglow/tf-mediapipe

npm i https://pkg.pr.new/@workglow/tf-mediapipe@500

commit: 1fa4277

Replace the FIFO eviction policy on `pendingAborts` in `WorkerServerBase` with a TTL-keyed `Map<id, timestamp>`. Under the old policy a misbehaving client spamming aborts for ids that never landed could silently evict a legitimate pending abort (the first 500 entries were dropped once the set crossed 1000) before its matching `call` arrived; the run-fn would then execute un-aborted. Changes: - `pendingAborts` is now `Map<string, number>` keyed on insertion timestamp. - 30s TTL; per-id one-shot cleanup timer (mirrors the `scheduleCompletedRequestCleanup` pattern). - Inline TTL sweep on every record. - 10k hard cap as a memory safety-net; when exceeded, evict the half with the lowest timestamps (oldest-by-timestamp, not insertion order). - `consumePendingAbort` rechecks the TTL on consumption so a stale marker that slipped past the sweep cannot wrongly abort a fresh call. Tests cover eviction correctness under a 1500-id flood, TTL expiry past 31s, and hard-cap behaviour preserving the newest ids while evicting the oldest.

The top-of-function JSDoc on `runWithIterable` had misleading language about "leaving the parent alone" / two-way bond. The implementation is actually parent->child only: - context.signal.abort() mirrors into localAbort -> strategy stops. - localAbort.abort() (from the finally) does NOT touch parentSignal; sibling consumers of the parent signal keep running. Rewrite the comment to spell out the one-way bond explicitly and add an inline marker above the `addEventListener("abort", ...)` line so future readers don't reintroduce a reverse propagation.

The new "hard-cap safety" test inserts 10,010 pendingAborts entries to exercise the >10k cap eviction path. Each insertion runs the O(n) inline TTL sweep in recordPendingAbort and schedules 2 fake setTimeouts, so the test does ~50M Map iterations plus ~20k fake-timer registrations. This exceeds vitest's default 15s timeout on slower CI runners, even though the assertions themselves are correct. Give just this one test a 60s budget; the other two new tests are cheap and keep the default timeout.

The inline `for (const [key, ts] of this.pendingAborts)` sweep ran on every abort and made `recordPendingAbort` O(n) per call — O(n^2) under the hard-cap test's 10,010-entry burst, which exceeded the 60s test timeout (~100M iterations). Cleanup is now amortised O(1) per call via three complementary mechanisms: 1. Per-id `setTimeout` (already present) drops each entry after TTL. 2. `consumePendingAbort` re-checks the timestamp at consume time (belt-and-braces for the TTL/timer-firing window). 3. The hard-cap overflow branch (already present) evicts the oldest half in one pass when size exceeds PENDING_ABORT_HARD_CAP — the memory safety-net for pathological bursts that outrun the timers. Behaviour is unchanged: the TTL-expiry test relies on the per-id setTimeout (real-timer advance fires it); the hard-cap test relies on the overflow eviction; the eviction-correctness test relies on a fresh marker not being evicted (now guaranteed since nothing else touches the map during a synchronous burst).

The hard-cap regression test was inserting 10_010 entries through recordPendingAbort under vi.useFakeTimers(), which empirically takes >60s on CI runners due to vitest's fake-timer heap costs (each setTimeout registration is O(log n), and with 10k+ entries queued the constant factors stack up). Make the hard cap injectable via a constructor option (default 10_000) so the test can use cap=100 and insert 110 entries — same assertions, ~100x less work. The eviction-correctness and TTL-expiry tests are unchanged since they only use 1500 / 1 entries respectively.

github-actions · 2026-05-15T00:30:35Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	62.23%	21880 / 35155
🔵	Statements	62.1%	22650 / 36470
🔵	Functions	64.03%	4136 / 6459
🔵	Branches	50.8%	10484 / 20634

File Coverage

No changed files found.

Generated in workflow #2253 for commit 1fa4277 by the Vitest Coverage Report Action

Copilot

Pull request overview

This PR updates worker abort-race handling to use TTL-based pending abort retention instead of FIFO eviction, adds regression coverage for pending abort eviction behavior, and clarifies runWithIterable abort propagation documentation.

Changes:

Adds WorkerServerBaseOptions and replaces pendingAborts Set with timestamped TTL/cap-based Map handling.
Adds tests for abort flood survival, TTL expiry, and hard-cap eviction behavior.
Updates runWithIterable JSDoc to describe parent-to-child-only abort propagation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`packages/util/src/worker/WorkerServerBase.ts`	Implements timestamped pending-abort tracking with TTL timers and hard-cap eviction.
`packages/test/src/test/util/WorkerServerBase.race.test.ts`	Adds coverage for the new pending-abort eviction policy.
`packages/ai/src/task/base/runWithIterable.ts`	Clarifies abort propagation documentation and inline comments.

Comments suppressed due to low confidence (1)

packages/util/src/worker/WorkerServerBase.ts:356

The hard cap only bounds pendingAborts; every abort still adds a timeout handle to pendingAbortTimers, and evicting entries at the cap does not clear those handles. A burst above the cap (or repeated aborts for the same id) can therefore create far more than 10k live timers for up to 30s, so the intended memory safety-net does not actually bound timer memory/work.

    const timer = setTimeout(() => {
      this.pendingAborts.delete(id);
      this.pendingAbortTimers.delete(timer);
    }, WorkerServerBase.PENDING_ABORT_TTL_MS);
    this.pendingAbortTimers.add(timer);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    // to drop this id after its TTL elapses. Stored so dispose-style hooks
+    // can clear pending timers; otherwise they GC naturally.
+    const timer = setTimeout(() => {
+      this.pendingAborts.delete(id);


- * to the provider, releasing the connection promptly rather than letting
- * the run keep streaming events into a closed queue.
+ * Abort bond is strictly one-way: parent → child.
+ *  • context.signal.abort() → localAbort aborts → strategy sees it.


+ * hard-coded values. Subclasses (platform-specific `WorkerServer` classes)
+ * accept the same shape and forward it via `super(options)`.


Address Copilot review comments on #500: - recordPendingAbort: re-recording an abort for an id that was already pending used to leave the previous setTimeout active. That old timer would fire at its original +TTL and delete the renewed entry early, even though the most recent record had its own full TTL. Switch pendingAbortTimers from Set<timer> to Map<id, timer>, clearTimeout the previous timer on re-record, and remove timer entries from the map on consume / TTL-fire / hard-cap eviction. This also bounds the timer-handle memory (previously the hard cap only bounded the data map, leaving timers to accumulate until TTL). - runWithIterable JSDoc: the example demonstrated an AbortSignal method that doesn't exist (signals can't be aborted directly; only their owning AbortController can). Rewrite the example to use a controller. - WorkerServerBase JSDoc: the pendingAbortHardCap option is only reachable via `new WorkerServerBase(...)` directly. Platform subclasses use the default. Clarify in-line so consumers don't think `new WorkerServer({ pendingAbortHardCap: ... })` works.

sroussey · 2026-05-15T17:16:17Z

@copilot resolve the merge conflicts in this pull request

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

Agent-Logs-Url: https://github.com/workglow-dev/libs/sessions/b1cbba42-d4c5-496f-b2d4-05d7dae090a4 Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

Copilot · 2026-05-15T17:25:43Z

@copilot resolve the merge conflicts in this pull request

Resolved the merge conflicts by merging origin/main into this branch and fixing the WorkerServerBase conflict while preserving the TTL-based pending-abort behavior. I also fixed follow-up merge fallout (completed-request cap constants + aligned eviction test expectations) in 8fb963b.

sroussey force-pushed the claude/loving-mendel-Hcznj-abort branch from 9d7d9ff to 3251e0c Compare May 15, 2026 00:20

sroussey added 5 commits May 15, 2026 00:26

sroussey force-pushed the claude/loving-mendel-Hcznj-abort branch from 3251e0c to 94d2497 Compare May 15, 2026 00:26

sroussey self-assigned this May 15, 2026

sroussey requested a review from Copilot May 15, 2026 00:45

Copilot started reviewing on behalf of sroussey May 15, 2026 00:45 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

sroussey mentioned this pull request May 15, 2026

security/correctness fixes from automated review (High) #506

Merged

5 tasks

Copilot started work on behalf of sroussey May 15, 2026 17:16 View session

Copilot AI and others added 2 commits May 15, 2026 17:21

Merge origin/main and resolve WorkerServerBase conflict

6a164d0

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

fix merge resolution for WorkerServerBase caps and eviction tests

8fb963b

Agent-Logs-Url: https://github.com/workglow-dev/libs/sessions/b1cbba42-d4c5-496f-b2d4-05d7dae090a4 Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

Copilot finished work on behalf of sroussey May 15, 2026 17:26

sroussey merged commit 1b9cc10 into main May 15, 2026
4 checks passed

sroussey deleted the claude/loving-mendel-Hcznj-abort branch May 15, 2026 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(util/worker, ai/task): TTL-based pendingAborts eviction; clarify runWithIterable bond#500

fix(util/worker, ai/task): TTL-based pendingAborts eviction; clarify runWithIterable bond#500
sroussey merged 8 commits into
mainfrom
claude/loving-mendel-Hcznj-abort

sroussey commented May 14, 2026

Uh oh!

pkg-pr-new Bot commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

sroussey commented May 15, 2026

Uh oh!

Copilot AI commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		* hard-coded values. Subclasses (platform-specific `WorkerServer` classes)
		* accept the same shape and forward it via `super(options)`.

Conversation

sroussey commented May 14, 2026

Why

What

Verification

Risks

Rollback

Uh oh!

pkg-pr-new Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

sroussey commented May 15, 2026

Uh oh!

Copilot AI commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pkg-pr-new Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading