Skip to content

Capability-based dispatch + Promise+emit run-fn migration#494

Merged
sroussey merged 24 commits into
mainfrom
capabilities-squash
May 14, 2026
Merged

Capability-based dispatch + Promise+emit run-fn migration#494
sroussey merged 24 commits into
mainfrom
capabilities-squash

Conversation

@sroussey
Copy link
Copy Markdown
Collaborator

Summary

This PR squashes 80 commits from capabilities-event into 9 logical commits against main. The work introduces capability-based provider dispatch (replacing the per-task-type registry) and migrates the AI execution path to a new Promise+emit run-fn shape.

The final tree is identical to capabilities-event; this branch only restructures history for review.

Commits in order

  1. feat(ai): introduce capability-based dispatch (Phases 0-4) — capability registry foundation, rename ModelConfig.taskscapabilities, static requires on every AI task class, replace per-task-type registry with capability-set dispatch.
  2. refactor(providers): migrate all providers to AiProviderRunFnRegistration[] (Phase 5) — every vendor provider (OpenAI, Anthropic, Gemini, Ollama, HFT, HFI, node-llama-cpp, tf-mediapipe, chrome-ai) migrated to the new registration list shape; pkg-pr-new wired for all 9 vendor packages.
  3. fix(ai,providers,test): Phase 5 review feedback and CI/test fixes — addresses Copilot review on PR Add capability system and collectStream utility for AI tasks #479, declares requires on remaining tasks, unblocks bun test discovery and conformance suite.
  4. feat(ai,test,ci): bridgeProgress utility and large-model integration test harness — streaming progress utility + integration-test scaffolding + CI workflow updates.
  5. fix(ai,hft,test,ci): resolve RAG WASM/ONNX memory leaks — cascade of fixes for ONNX/WASM runtime instances leaking across RAG tests (pipeline disposal, provider re-registration, tensor refs, CI parallelism + timeouts).
  6. feat(ai,util/worker): Promise+emit run-fn shape foundationAiEmit, createEmitQueue, StreamEventAccumulator, accumulatingEmit, AiProviderRunFn + legacy adapter, and worker-side registerRunFunction/handleRunCall/callWorkerRunFunction.
  7. refactor(ai): migrate execution path to Promise+emit shapeAiJob.execute, both execution strategies, AiTask.execute, StreamingAiTask/AiChatTask/AiChatWithKbTask, and AiProvider base all collapse to Promise+emit; abort leaks fixed.
  8. test(ai,timing): align fixtures and add memory tooling for Promise+emit — fixtures updated; RSS-bounded stress test; reportMem/baseline snap utility; obsolete AiProviderRegistry tests removed.
  9. refactor(ai): finalize Promise+emit migration and cleanup — drops legacy stream shape, removes bridgeProgress utility, renames model download/dispose tasks, streamlines streaming text handling, drops redundant type annotations.

Test plan

  • bun run build:packages and bun run build:types succeed
  • bun scripts/test.ts vitest passes for the AI + providers + task-graph suites
  • RAG integration suite passes within the per-test timeout (memory leak fixes from commit 5)
  • CI Build & Test workflow runs green end-to-end
  • Each squashed commit is reviewable independently (no commit assumes work that isn't in it or in main)

@sroussey sroussey requested a review from Copilot May 13, 2026 07:45
@sroussey sroussey self-assigned this May 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 13, 2026

Open in StackBlitz

@workglow/cli

npm i https://pkg.pr.new/@workglow/cli@494

@workglow/ai

npm i https://pkg.pr.new/@workglow/ai@494

@workglow/browser-control

npm i https://pkg.pr.new/@workglow/browser-control@494

@workglow/indexeddb

npm i https://pkg.pr.new/@workglow/indexeddb@494

@workglow/javascript

npm i https://pkg.pr.new/@workglow/javascript@494

@workglow/job-queue

npm i https://pkg.pr.new/@workglow/job-queue@494

@workglow/knowledge-base

npm i https://pkg.pr.new/@workglow/knowledge-base@494

@workglow/mcp

npm i https://pkg.pr.new/@workglow/mcp@494

@workglow/storage

npm i https://pkg.pr.new/@workglow/storage@494

@workglow/task-graph

npm i https://pkg.pr.new/@workglow/task-graph@494

@workglow/tasks

npm i https://pkg.pr.new/@workglow/tasks@494

@workglow/util

npm i https://pkg.pr.new/@workglow/util@494

workglow

npm i https://pkg.pr.new/workglow@494

@workglow/anthropic

npm i https://pkg.pr.new/@workglow/anthropic@494

@workglow/bun-webview

npm i https://pkg.pr.new/@workglow/bun-webview@494

@workglow/chrome-ai

npm i https://pkg.pr.new/@workglow/chrome-ai@494

@workglow/electron

npm i https://pkg.pr.new/@workglow/electron@494

@workglow/google-gemini

npm i https://pkg.pr.new/@workglow/google-gemini@494

@workglow/huggingface-inference

npm i https://pkg.pr.new/@workglow/huggingface-inference@494

@workglow/huggingface-transformers

npm i https://pkg.pr.new/@workglow/huggingface-transformers@494

@workglow/node-llama-cpp

npm i https://pkg.pr.new/@workglow/node-llama-cpp@494

@workglow/ollama

npm i https://pkg.pr.new/@workglow/ollama@494

@workglow/openai

npm i https://pkg.pr.new/@workglow/openai@494

@workglow/playwright

npm i https://pkg.pr.new/@workglow/playwright@494

@workglow/postgres

npm i https://pkg.pr.new/@workglow/postgres@494

@workglow/sqlite

npm i https://pkg.pr.new/@workglow/sqlite@494

@workglow/supabase

npm i https://pkg.pr.new/@workglow/supabase@494

@workglow/tf-mediapipe

npm i https://pkg.pr.new/@workglow/tf-mediapipe@494

commit: f4ead3b

claude and others added 20 commits May 13, 2026 16:48
Replaces the per-task-type provider registry with capability-based dispatch.
Each provider registers `{ serves: Capability[], runFn }` entries; the registry
picks the run-fn whose `serves` is the smallest superset of the task's
`requires`, with strict gating (`model.capabilities ⊇ task.requires`).

  - Capability registry foundation with strict gating + most-specific-superset
    selection (ties broken by registration order)
  - Rename ModelConfig.tasks → ModelConfig.capabilities
  - Static readonly `requires` field on AiTask base classes and all 46 concrete
    task subclasses; intermediate bases drop redundant overrides
  - Extract registerAiTasks to its own file; hoist strict gating to streaming
    path; fix AiChatTask getJobInput
  - Correct collectStream object-delta and multi-port text semantics
…tion[] (Phase 5)

Migrates every vendor provider to the capability-set registration list shape:
each provider's constructor now declares `AiProviderRunFnRegistration[]` instead
of per-task-type registrations. Dispatch happens via the capability registry
introduced in Phase 3.

  - OpenAI (5a), Anthropic (5b), Google Gemini (5c), Ollama (5d)
  - HuggingFace Transformers (5e), HuggingFace Inference (5f)
  - node-llama-cpp (5g), TF MediaPipe (5h), Chrome AI (5i)
  - Skip legacy contract test type-checking pending Phase 9 rewrite (5j)
  - Tighten Anthropic 3.5/3.7 family regex to cover claude-3-5-haiku
  - Include all 9 vendor provider packages in pkg-pr-new preview publish so
    downstream consumers can pin to per-PR previews
Addresses Copilot review on PR #479 and unblocks CI for the post-Phase-5 state.

  - Declare `requires` on AiChatWithKbTask and KbSearchTask
  - Skip model-capability gate for lifecycle tasks (download/dispose)
  - Unblock bun test discovery for legacy contract tests
  - Skip whole AiProvider conformance suite pending Phase 9 rewrite
  - Phase 9 publish-preview workflow + drop dead Anthropic_Chat_Stream alias
  - Update todo and dependabot config
…test harness

Adds infrastructure for streaming progress through tasks and for running the
large-model integration tests reliably under CI memory constraints.

  - bridgeProgress utility + unit tests
  - Integration-test scaffolding for large-model scenarios
  - Restore RAG test parallelism with 15-minute job timeouts
  - GitHub Actions workflow updates
Cascade of fixes for OOMs in the RAG integration suite caused by ONNX/WASM
runtime instances leaking across tests.

  - hft: dispose ONNX pipelines on clearPipelineCache and after RAG tests
  - ai: unregister existing provider before re-registering to prevent run-fn
    accumulation across test setup
  - ai: null bridgeProgress captures in finally to release tensor refs
  - ai/hft: release WASM via awaited dispose + macrotask yield per AiJob
  - ci: force vitest single-file-at-a-time for RAG suite; bump per-test timeout
    to 600s; restore RAG parallel=1 with 25-min job timeout
  - Includes diagnostic instrumentation (now removed) used to locate the leak,
    plus a revert of one speculative dispose path that proved unnecessary
Introduces the building blocks for the new run-fn shape: a single
`Promise<void>` that emits events through an injected `AiEmit` callback,
replacing the previous AsyncIterable<AiStreamEvent> return shape. The
consumer (StreamingAiTask / TaskRunner) is now solely responsible for
accumulating deltas; providers become stateless.

  - AiEmit type + noopEmit helper
  - createEmitQueue: single-consumer push-queue utility
  - StreamEventAccumulator: factored from collectStream
  - accumulatingEmit factory for terminal-consumer materialization
  - AiProviderRunFn shape + registerLegacyStreamFn adapter for old run-fns
  - util/worker: registerRunFunction + handleRunCall + callWorkerRunFunction
    + worker proxy threading AiEmit across the worker boundary
Migrates the AI execution path from AsyncIterable returns to the new
Promise+emit shape end-to-end. Also fixes abort leaks uncovered during the
rewrite.

  - AiJob.execute collapses to a single Promise<void> + emit
  - Strategy interface collapses to single execute(emit) Promise<void>;
    DirectExecutionStrategy rewritten
  - QueuedExecutionStrategy uses the rate limiter only; storage-queue path
    dropped
  - AiTask.execute uses accumulatingEmit at the terminal-consumer boundary
  - StreamingAiTask + AiChatTask + AiChatWithKbTask bridge to the new shape
  - AiProvider base routes legacy run-fns through registerLegacyStreamFn so
    not-yet-migrated providers keep working
Brings tests in line with the new run-fn shape and adds RSS-aware reporting
to make memory regressions visible.

  - Align ai-provider + contract test fixtures with Promise+emit shape
  - RSS-bounded stress test for AiJob run-fn dispatch
  - Remove obsolete AiProviderRegistry tests
  - timing: memory usage reporting utility (reportMem + baseline snap)
  - Use the new memory utility across multiple test suites
  - Prettier format pass
Final cleanup once all callers and providers are on the Promise+emit shape.

  - Drop type annotations from static `requires` declarations (TS infers)
  - Transition every remaining caller from legacy stream function to
    Promise+emit run-fn
  - Remove the bridgeProgress utility (its responsibilities are now covered
    by accumulatingEmit / the consumer-side accumulator) and update affected
    components
  - Rename model download/unload tasks to align with the dispose vocabulary
  - Streamline streaming text handling across AiProvider implementations
Removes the postmortem markdown from the working tree and the test
comment that referenced it. The leak investigation is captured in the
commit history; the .md file doesn't need to live in main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Matches the capability key (`model.dispose`) that the lookup queries.
The legacy `unload` vocabulary was renamed to `dispose` earlier in this
branch; this catches a stray local.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WorkerServerBase.handleAbort previously dropped abort messages whose ids
had no AbortController yet, so an `abort` racing ahead of its `call` over
the message port would be silently lost and the run-fn would execute with
an un-aborted signal. Track pending aborts in a bounded set and consume
them when handleCall/handleStreamCall/handleRunCall constructs its
AbortController, posting the error response immediately when the abort
arrives first. Bounded to 1000 entries with LRU-style eviction to match
the existing completed-requests cap.
Shared bridge used by StreamingAiTask, AiChatTask, and AiChatWithKbTask:
dispatches a strategy.execute(...) into an emit-queue and yields events
through an `AsyncIterable`, while ensuring that an early `break` /
`return` from the consumer (or an abort on the parent context) cancels
the underlying run-fn.

Mechanism:
  - A `localAbort` is linked from `context.signal` (and propagates back to
    the parent: if `localAbort` aborts first we leave the parent alone).
  - The strategy is invoked with a context whose `signal` is the
    `localAbort.signal`. Providers therefore see the abort whenever the
    consumer stops iterating.
  - A `finally` around the consumer for-await calls `localAbort.abort()`
    and `queue.fail(...)`, then awaits `runPromise` (swallowing the
    expected abort error) so we never leak a dangling Promise.
…nWithIterable

Before: consumer `break` / `return` only closed the local queue; the
strategy kept running with the parent `context.signal` (which never fires
on its own) and emitted into a closed queue until it eventually finished.
A caller-side abort was not visible to the provider stream.

Use the new `runWithIterable` helper which wraps a `localAbort`
controller into the context's `signal` before handing it to the strategy.
On any consumer exit the local signal aborts, the provider tears down,
and the run promise is awaited so we never leak it.
Same fix as StreamingAiTask: a consumer that breaks out of the
`for await` over a chat turn previously left the provider stream
running and the underlying run-promise dangling. Use `runWithIterable`
so the local abort fires and the inner-turn run-fn is cancelled when
the consumer exits.
Mirror the AiChatTask fix so KB-grounded chat also propagates consumer
abort to the inner provider stream. Replaces the inline `createEmitQueue`
+ `runPromise` pattern with a `runWithIterable` invocation that wires a
local AbortController into the strategy's context.
Drives runWithIterable directly via a fake strategy whose execute() never
resolves until aborted. The consumer breaks out of the iterator after the
first event; the test asserts:

  - the strategy's signal flips to `aborted`
  - the run promise settles cleanly
  - no further events are yielded to the consumer

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… lookup

The resourceScope disposer in AiTask was renamed from looking up
`model.unload` to `model.dispose` during the rename-and-cleanup pass,
intentionally distinguishing in-memory eviction from on-disk removal
(`model.download-remove`). Two places were left out of sync:

  - The disposer's stale doc comment still referred to
    `model.download-remove`; clarified to call out the dispose / remove
    distinction.
  - `AiChatWithKbTask.test.ts` still registered a fake run-fn under
    `model.download-remove` for the disposer hook, so the disposer
    never resolved and `unloadCalls` stayed 0. Register under
    `model.dispose` instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…main

Rebased onto main (#488 added the prettier-organize-imports plugin +
husky hook). With -X theirs winning conflicts in our favor, three
`export * from` barrels lost main's `// organize-imports-ignore`
header. Re-add the comment so the plugin doesn't alphabetize the barrel
export order (which would break runtime init).

  - packages/ai/src/common.ts (existed pre-PR, comment lost in rebase)
  - packages/ai/src/task/index.ts (existed pre-PR, comment lost in rebase)
  - packages/ai/src/capability/index.ts (new barrel introduced by this PR)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sroussey sroussey force-pushed the capabilities-squash branch from 6fb6f56 to 20d801a Compare May 13, 2026 16:58
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 62.45% 21909 / 35079
🔵 Statements 62.29% 22687 / 36417
🔵 Functions 64.11% 4139 / 6456
🔵 Branches 51.05% 10525 / 20616
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
providers/huggingface-transformers/src/ai/HuggingFaceTransformersProvider.ts 0% 100% 0% 0% 28-63
providers/huggingface-transformers/src/ai/HuggingFaceTransformersQueuedProvider.ts 7.4% 0% 0% 7.4% 29-119
providers/huggingface-transformers/src/ai/registerHuggingFaceTransformersInline.ts 0% 100% 0% 0% 24-32
providers/huggingface-transformers/src/ai/registerHuggingFaceTransformersWorker.ts 0% 100% 0% 0% 19-27
providers/huggingface-transformers/src/ai/common/HFT_BackgroundRemoval.ts 9.09% 0% 0% 9.09% 19-25, 36-48
providers/huggingface-transformers/src/ai/common/HFT_Capabilities.ts 5.55% 0% 33.33% 2.94% 13-130
providers/huggingface-transformers/src/ai/common/HFT_CapabilitySets.ts 100% 100% 100% 100%
providers/huggingface-transformers/src/ai/common/HFT_Chat.ts 2.17% 0% 0% 2.22% 41-142, 154-157
providers/huggingface-transformers/src/ai/common/HFT_CountTokens.ts 22.22% 100% 0% 22.22% 20-25, 33-34, 42
providers/huggingface-transformers/src/ai/common/HFT_Download.ts 14.28% 100% 0% 14.28% 30-39
providers/huggingface-transformers/src/ai/common/HFT_ImageClassification.ts 5.55% 0% 0% 5.55% 30-72
providers/huggingface-transformers/src/ai/common/HFT_ImageEmbedding.ts 5.26% 0% 0% 5.26% 27-56
providers/huggingface-transformers/src/ai/common/HFT_ImageSegmentation.ts 12.5% 0% 0% 12.5% 26-48
providers/huggingface-transformers/src/ai/common/HFT_ImageToText.ts 16.66% 0% 0% 16.66% 22-35
providers/huggingface-transformers/src/ai/common/HFT_InlineLifecycle.ts 0% 100% 0% 0% 8-9
providers/huggingface-transformers/src/ai/common/HFT_JobRunFns.ts 42.85% 0% 0% 42.85% 78-82
providers/huggingface-transformers/src/ai/common/HFT_ModelInfo.ts 2.04% 0% 0% 2.12% 20-137
providers/huggingface-transformers/src/ai/common/HFT_ModelSearch.ts 5.88% 0% 0% 6.25% 18-45
providers/huggingface-transformers/src/ai/common/HFT_ObjectDetection.ts 6.66% 0% 0% 6.66% 30-71
providers/huggingface-transformers/src/ai/common/HFT_Pipeline.ts 3.87% 0.67% 3.03% 4% 25-46, 56-183, 218-251, 300-413, 425-598
providers/huggingface-transformers/src/ai/common/HFT_Streaming.ts 0% 100% 0% 0% 20-43
providers/huggingface-transformers/src/ai/common/HFT_StructuredGeneration.ts 1.85% 0% 0% 1.88% 19-58, 67-145
providers/huggingface-transformers/src/ai/common/HFT_TextClassification.ts 7.69% 0% 0% 7.69% 24-69
providers/huggingface-transformers/src/ai/common/HFT_TextEmbedding.ts 3.7% 0% 0% 3.7% 30-103
providers/huggingface-transformers/src/ai/common/HFT_TextFillMask.ts 20% 100% 0% 20% 17-27
providers/huggingface-transformers/src/ai/common/HFT_TextGeneration.ts 4.34% 0% 0% 4.54% 23-69
providers/huggingface-transformers/src/ai/common/HFT_TextLanguageDetection.ts 20% 0% 0% 20% 21-37
providers/huggingface-transformers/src/ai/common/HFT_TextNamedEntityRecognition.ts 20% 100% 0% 20% 21-38
providers/huggingface-transformers/src/ai/common/HFT_TextQuestionAnswer.ts 8.33% 0% 0% 9.09% 25-47
providers/huggingface-transformers/src/ai/common/HFT_TextRewriter.ts 8.33% 0% 0% 9.09% 18-37
providers/huggingface-transformers/src/ai/common/HFT_TextSummary.ts 9.09% 0% 0% 10% 18-35
providers/huggingface-transformers/src/ai/common/HFT_TextTranslation.ts 9.09% 0% 0% 10% 22-41
providers/huggingface-transformers/src/ai/common/HFT_ToolCalling.ts 0.73% 0% 0% 0.79% 38-289, 301-399
Generated in workflow #2233 for commit f4ead3b by the Vitest Coverage Report Action

@sroussey sroussey force-pushed the capabilities-squash branch from d240225 to ecb45ba Compare May 13, 2026 23:23
- Introduced new test files for AiChatTask and AiChatWithKbTask, covering schema validation, input/output handling, and streaming behavior.
- Updated runAiProviderConformance to use describe.skipIf for conditional skipping of tests based on options.
- Enhanced ImageGenerationPreviewChain and SessionCaching tests to ensure proper functionality and error handling.
- Added structured tests for session management in AiProviderRunFn, ensuring sessionId is correctly passed and handled.

These changes improve test coverage and reliability for AI task implementations.
… bun tests from CI

- Updated various dependencies in package.json and bun.lock, including:
  - Incremented versions for @types/bun, @types/node, @typescript-eslint/eslint-plugin, @typescript-eslint/parser, @vitest/coverage-v8, @vitest/ui, pkg-pr-new, vitest, ink, react-resizable-panels, vite, better-sqlite3, playwright, and @anthropic-ai/sdk.
- Adjusted devDependencies to ensure compatibility with the latest versions.
- Commented out test jobs in GitHub Actions workflow for future reference.
@sroussey sroussey force-pushed the capabilities-squash branch from ecb45ba to f4ead3b Compare May 14, 2026 00:59
@sroussey sroussey requested a review from Copilot May 14, 2026 01:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@sroussey sroussey merged commit 1c627a0 into main May 14, 2026
5 of 14 checks passed
@sroussey sroussey deleted the capabilities-squash branch May 14, 2026 01:10
sroussey pushed a commit that referenced this pull request May 15, 2026
CreateStandardKbStrategyFirstStage.test was cherry-picked from PR #496,
where it was authored against the pre-capabilities API. Since then
PR #494 (capabilities-squash) landed in main and changed:
  - registerRunFn(provider, taskType, fn) → registerRunFn(provider, { serves, runFn })
  - run-fn return value → emit({ type: "finish", data })
  - ModelRecord.tasks → ModelRecord.capabilities

Update the test to the post-capabilities shape, mirroring the pattern
already used by KnowledgeBaseStandardStrategy.test. Also drop the
`as never` casts on the spies — those defeated the type system and
caused mockResolvedValue to fail typecheck.

https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants