fix(kb,hft): document strategy trust model; runtime shape guard on HFT_TextReranker by sroussey · Pull Request #501 · workglow-dev/libs

sroussey · 2026-05-14T08:19:48Z

Why

Two High-severity findings from a security review of PR #484 (the KB strategy refactor).

H1 — `IKbStrategyTarget` is an undocumented trusted-code boundary

An installed IKbAiStrategy receives an IKbStrategyTarget that exposes the KB's full low-level storage surface (upsertDocument, deleteChunksForDocument, upsertChunksBulk, similaritySearch, hybridSearch). These operations bypass any application-level access control because scoping is implemented via virtual dispatch on the target instance — a malicious or buggy strategy can violate the scoping contract by routing data through a different KB. This trusted-code boundary needs to be documented so operators do not load strategies from untrusted sources, user input, or remote registries.

H2 — `HFT_TextReranker` cast was loose and accessed `.score` without validation

const rawResults = (await (reranker as unknown as (
  inputs: Array<{ text: string; text_pair: string }>,
  options?: Record<string, unknown>
) => Promise<Array<...>>)(pairs, { top_k: 1 })) as Array<...>;

const scores: number[] = rawResults.map((r) => {
  if (Array.isArray(r)) return r[0]?.score ?? 0; // silent fallback hides real bugs
  return r.score;                                 // unchecked .score access
});

A transformers.js pipeline producing a different shape (mismatched task, wrong field names, null entries) would silently return garbage scores rather than fail loudly.

What

TSDoc trust-model paragraphs on IKbAiStrategy and IKbStrategyTarget spelling out that strategies are TRUSTED CODE and must not be loaded from untrusted sources.
Pointer TSDoc on KnowledgeBase and setAiStrategy linking back to the trust model.
Replaced the as unknown as cast with a runtime isScored shape guard.
Added KbRerankerOutputError (co-located with TextRerankerTask in @workglow/ai, exported from the package barrel so callers instanceof-test).
Extracted the validation as validateAndExtractRerankerScores(rawResults, modelPath): number[] — pure helper, no side effects, fully unit-testable. The run-fn delegates to it.
Removed the silent ?? 0 fallback so a missing score now throws KbRerankerOutputError with the model path and a truncated shape snippet.
Added shape-validation tests in packages/test/src/test/rag/HFT_TextReranker.shape.test.ts.
Re-exported HFT_TextReranker (and the new helper) from @workglow/huggingface-transformers/ai-runtime so tests can reach them without deep-path imports.

Stacking

This PR is stacked: its base is claude/loving-mendel-bS9js-pr484 (the head of PR #496) so #496's fixes (rerank firstStageMinimum floor + KbSearchTask scoreThreshold forwarding) are preserved. Merge order: #484 → #496 → this PR.

Verification

bun run build:types — confirm the new error class + helper typecheck cleanly across @workglow/ai and @workglow/huggingface-transformers.
cd packages/test && bun test src/test/rag/HFT_TextReranker.shape.test.ts — exercise the shape guard (3 negative cases + valid mixed-form case + diagnostic-payload case).
cd packages/test && bun test src/test/rag/ — regression check on the rest of the RAG suite.

Defense-in-depth (deferred)

A Proxy wrapper that enforces the KB's scope on every strategy-target call was considered (intercept upsertDocument / similaritySearch / etc. on the proxy and refuse calls that try to escape the installed ScopedKnowledgeBase). Deferred to a follow-up issue:

adds per-call overhead on every low-level storage method;
ScopedKnowledgeBase virtual-dispatch already provides the primary boundary;
the trust model documented here is the higher-leverage fix for the immediate finding.

Risks

Other hand-rolled getPipeline stubs elsewhere in the codebase may currently return shapes the old code silently absorbed via ?? 0. They will now throw KbRerankerOutputError instead. Grep for text-classification pipeline stubs before merging.
Adding HFT_TextReranker to the ai-runtime barrel grows the runtime surface by one named export; the symbol is small and benign.

Rollback

Two independent commit groups so each H can be reverted on its own:

docs(kb): … — H1 (TSDoc-only on KB types; no runtime change).
fix(hft): … / test(hft): … — H2 (runtime guard + test).

https://claude.ai/code/session_01LkKCHC8q6b5e7Nwvc8L3M6

Generated by Claude Code

The KB now owns its `docEmbeddingModel`, `queryEmbeddingModel`, and `rerankerModel` IDs directly. Callers go through `kb.search()` / `kb.searchWithRerank()` / `kb.upsertDocumentWithIndex()` without threading models — the KB delegates to an installed `IKbAiStrategy` (typically `createAiKbStrategy(kb)` from `@workglow/ai`). This replaces the `onSearch` / `onDocumentUpsert` / `onDocumentDelete` callback soup with a single typed strategy interface. RAG callers no longer have to wire model IDs through every retrieval call site, and the KB exposes the search "kind" (similarity / hybrid / rerank) as a single dispatcher. - knowledge-base: add `IKbAiStrategy`, store model IDs as readonly fields, add `search`/`searchWithRerank`/`upsertDocumentWithIndex`/ `reindex`; remove the three callback hooks - ai: add `TextRerankerTask` (AiTask) and `KbReindexTask`; simplify `KbSearchTask` (no model input — KB picks the kind); add `createAiKbStrategy(kb)` factory wiring HierarchicalChunker + TextEmbedding + TextReranker - huggingface-transformers: add `HFT_TextReranker` run-fn that loads cross-encoder text-classification pipelines (bge-reranker, ms-marco-MiniLM) and scores `[query, doc]` pairs - tests: replace KB callback suite with strategy suite https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex

…riven Follow-up to the model-fields refactor. The KB's public RAG surface now has exactly three methods — `upsert(doc)`, `delete(doc_id)`, `search(query, opts)` — and they all delegate to an installed `IKbAiStrategy`. The lower-level storage methods (upsertDocument, upsertChunksBulk, similaritySearch, hybridSearch, etc.) stay on the class as strategy-facing building blocks; subclasses can still intercept them via virtual dispatch. `IKbAiStrategy` itself is now high-level: - `ingest(kb, doc)` — chunk + embed + write in one shot - `delete(kb, doc_id)` — cascading delete by default - `search(kb, query, opts)` — strategy picks the retrieval mode The "search kind" parameter is gone. Mode is part of the KB's stored config (new `searchMode` field, alongside `chunkStrategy`) and the standard strategy reads it on every call. - `createStandardKbStrategy({ chunker?, chunkStrategy?, searchMode? })` replaces `createAiKbStrategy`. It reads model IDs + chunkStrategy + searchMode from the KB at op time, so config changes take effect on the next call without rewiring. - `KbSearchTask` drops `kind`; just `{ knowledgeBase, query, topK?, filter? }`. - KB no longer exposes `searchWithRerank` / `upsertDocumentWithIndex` publicly; that orchestration lives in the strategy. - Custom strategies (e.g. for per-tenant scoping in the builder) keep working — they implement the same three methods and call the building-block methods however they want. Builder uses `ScopedKnowledgeBase` (which overrides the low-level methods) + `createStandardKbStrategy()` on top — scope injection rides through virtual dispatch without the strategy knowing. - Tests updated to exercise the new strategy shape. https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex

Reconcile this branch's KB redesign (collapsed upsert/delete/search API backed by IKbAiStrategy) with main's PR #478 (ITextIndex + BM25F + RRF hybrid search). Both designs land — they're complementary: - The strategy seam stays the only public RAG entry point. Callbacks on KnowledgeBase (onDocumentUpsert / onDocumentDelete / onSearch) introduced by #478 are dropped; IKbAiStrategy replaces them. - The text-index infrastructure from #478 lands intact: installTextIndex, reindexText, getTextIndex on KnowledgeBase; auto-sync to the index on chunk upsert; auto-remove on document / chunk delete; BM25Index + Tokenizer in @workglow/storage. - kb.hybridSearch is rewritten per #478: KB-layer RRF fusion over similaritySearch + the installed text index. New options (vectorWeight, rrfK, candidatePoolMultiplier); no scoreThreshold since RRF scores aren't comparable to cosine. - kb.textSearch (new) is the pure-FTS path. searchMode="text" in the standard strategy now calls it directly instead of a zero-vector hybrid hack. - kb.supportsHybridSearch reports whether a text index is installed (not whether the storage backend has its own hybrid implementation). - IKbStrategyTarget grows a textSearch method and the new hybridSearch signature so createStandardKbStrategy can call them. - createStandardKbStrategy: searchMode="hybrid" uses the new KB-layer RRF; "text" uses textSearch; "rerank" first-stage uses hybrid when a text index is installed, falls back to similarity otherwise; "similarity" unchanged. https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex

- createStandardKbStrategy.ingest: drop the duplicate upsertDocument when doc.doc_id is initially missing. upsertDocument already returns the stored doc with the assigned id, so a second write was both wasted I/O and a hidden trap for ScopedKnowledgeBase overrides that do scope-injection work on every write. - createStandardKbStrategy: rewrite the firstStageMultiplier docstring to match the implementation (no 20-floor — just `topK * multiplier` with a `topK` floor so we never return fewer than the caller asked for). - IKbStrategyTarget.upsertChunksBulk: tighten the return type from Promise<unknown> to Promise<ChunkVectorEntity[]> so strategies don't have to cast. - ONNXModelSamples: add "TextRerankerTask" to bge-reranker-base's `tasks` array. The new standard strategy invokes TextRerankerTask (not the legacy heuristic RerankerTask), and AiTask.narrowInput resolves models by task tag — without this, rerank mode silently fails model resolution. - RerankerTask + TextRerankerTask: rewrite the JSDoc comments that referenced the now-removed `createAiKbStrategy` and `kb.searchWithRerank()`. Both now point readers at `createStandardKbStrategy` and the `searchMode: "rerank"` integration point. All 191 rag tests pass. https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex

…eType Address libs PR #484 review plan in one sweep, ordered by blast radius. Data integrity — ingest now delete-then-upsert-then-insert: - createStandardKbStrategy.ingest deletes old chunks BEFORE rewriting the document when doc.doc_id is set, so a partial failure (e.g. upsertChunksBulk rejecting) leaves "doc row preserved, chunks removed" rather than "new doc row pointing at stale old chunks." When doc.doc_id is unset, upserts first to mint the id, then runs deleteChunksForDocument as a defensive no-op so the post-condition ("after ingest, doc owns exactly the new chunks") holds even on storage backends that recycle ids. Concurrency — strategy snapshot per public op: - KnowledgeBase.upsert/delete/search/reindex now make the snapshot explicit (const strategy = this.requireStrategy(...)) and document the semantics on setAiStrategy: replacing the strategy does NOT affect ops already in flight; each public op resolves its strategy at entry. reindex captures once and uses the same strategy for the whole loop. Search correctness — surface canonical text + drop JSON.stringify fallback: - New chunkText(c) helper on @workglow/knowledge-base reads metadata.text and throws (with the chunk_id) when missing. Replaces the inline `meta?.text … JSON.stringify(meta ?? {})` map in createStandardKbStrategy and KbSearchTask. Documents metadata.text as a load-bearing contract on InsertChunkVectorEntity. Score semantics — tag rerank with scoreType: "rerank": - ScoreType union extended with "rerank". Both cross-encoder and heuristic-fallback rerank paths now set scoreType: "rerank" as const, overriding the first-stage cosine/RRF tag. Docstrings on createStandardKbStrategy, IKbAiStrategy.search, and the ScoreType union itself flag that cross-encoder logits are NOT comparable to cosine/BM25/RRF scores; callers must inspect scoreType before applying a score threshold. scoreThreshold is intentionally not honored in the rerank branch (commented in-code). ChunkRetrievalTask output schema enum extended to keep parity with the canonical union; the task itself only emits cosine/rrf. Tests: - mid-op setAiStrategy(B) during search/upsert/reindex: assert the in-flight op completes via the original strategy (DocumentRepository "strategy contract" block, 3 tests). - chunkText helper: present → returns text; missing → throws with chunk_id (DocumentRepository, 2 tests). - KbSearchTask: result with metadata lacking text throws with chunk_id rather than emitting JSON.stringify (KbSearchTask.test, 1 test). - New KnowledgeBaseStandardStrategy.test.ts exercises the actual createStandardKbStrategy: ingest-order partial-failure leaves no orphan chunks; rerank heuristic-fallback tags results "rerank". Setup registers a stub TextEmbeddingTask runFn + model record so the strategy's embedTexts call resolves without real providers. All new tests + KbSearchTask + DocumentRepository tests pass (191 rag tests pass, the 7 remaining failures are pre-existing HuggingFace-503 flakes in EndToEnd / RagWorkflow integration tests). https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex

…/* + examples/cli The previous publish list was a hand-maintained subset (util, storage, job-queue, task-graph, knowledge-base, tasks, ai, ai-provider, workglow, examples/cli) and referenced a defunct `ai-provider` directory. As a result, the PR-preview install URLs for every vendor package (anthropic, openai, ollama, huggingface-*, sqlite, postgres, supabase, mcp, indexeddb, javascript, browser-control, etc.) 404'd, blocking downstream repos that want to consume a libs PR via overrides. Replace with `./packages/* ./providers/* ./examples/cli` so the shell expands to every workspace at exec time. `pkg-pr-new` honors the `"private": true` flag on `@workglow/test`, so internal-only workspaces are skipped automatically — no need to enumerate. https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex

…nfig threading Brings in four main commits: - fix(storage-migrations): serialize concurrent runs + SQLite rollback (#485) - fix(knowledge-base,storage,postgres): cross-KB getBulk isolation + PostgresFTS index (#486) - perf(build): turbo task graph + TS project references (#489) - feat(ai,task-graph): thread runConfig through CreateWorkflow and AI wrappers (#490) Conflict resolutions: - KnowledgeBase.ts: keep reindexText hooks (beginRebuild/commitRebuild/abortRebuild) and fire-and-forget Promise handling in syncTextIndexForChunk from #486; keep our strategy-driven public API (upsert/delete/search) and chunkStrategy/searchMode fields - ChunkRetrievalTask.ts: keep explicit scoreType annotation "cosine"|"bm25"|"rrf"|"rerank" https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex

The rerank-mode first-stage size was computed as `Math.max(topK * firstStageMultiplier, topK)`, which is a no-op since `topK * mult >= topK` whenever `mult >= 1`. The intended floor was a fixed minimum (commented as such), so very small `topK` (e.g. topK=1, mult=5 -> 5 candidates) silently collapsed the reranker's input down to a handful of candidates with no real choice to make. Add a new `firstStageMinimum` option to `CreateStandardKbStrategyOptions` (default 20) and use it as the actual floor: `Math.max(topK * firstStageMultiplier, firstStageMinimum)`. Update JSDoc on `firstStageMultiplier` and the new `firstStageMinimum` to describe how they interact. Adds a vitest suite that spies on `hybridSearch` / `similaritySearch` and asserts the first-stage `topK` value forwarded to them across representative inputs.

The input schema accepted no `scoreThreshold`, and `execute` did not destructure or forward one either, so any threshold supplied via the task surface was silently dropped before reaching `kb.search`. Callers wiring a threshold through a workflow would get unfiltered results with no warning. Add `scoreThreshold` to the input schema (number, minimum 0) and forward it in the call to `kb.search(query, { topK, filter, scoreThreshold })`. Note that the standard strategy still ignores the threshold in rerank mode by design (cross-encoder logits aren't on the same scale as cosine/RRF scores) — that contract is documented in `createStandardKbStrategy` and the new schema description. Adds a vitest suite that spies on `kb.search` and asserts the threshold is forwarded when provided and absent (undefined) when omitted.

…egyTarget Strategies receive the KB's full low-level storage surface and can bypass ScopedKnowledgeBase virtual-dispatch scoping. Documents this explicitly so operators do not load strategies from untrusted sources, user input, or remote registries. Also adds pointer TSDoc on KnowledgeBase and setAiStrategy referencing the new trust model paragraph.

One-sentence pointer added on the class TSDoc and on setAiStrategy so callers installing a strategy are reminded to read the trust model documented on IKbAiStrategy / IKbStrategyTarget.

…on mismatch HFT_TextReranker used `as unknown as (...) => Promise<...>` to coerce the transformers.js pipeline, then read `.score` without runtime validation. A pipeline that returned a different shape (e.g. missing scores, wrong field names) would silently produce garbage scores instead of failing loudly. - Added KbRerankerOutputError in @workglow/ai's TextRerankerTask module so callers can `instanceof`-test from the public barrel. - Replaced the cast with a narrow unknown-typed local and an `isScored` guard. On mismatch we throw KbRerankerOutputError including the model path and a truncated shape snippet for diagnostics. - Removed the silent `?? 0` fallback so missing scores fail loudly instead of returning 0. - Added a shape-validation test that mocks getPipeline and exercises (a) total shape mismatch, (b) the valid mixed object/array-of-objects case from `top_k > 1`, (c) a partial mismatch within an otherwise valid batch.

Refactors the runtime shape guard out of `HFT_TextReranker` into an exported pure helper `validateAndExtractRerankerScores`, then tests that helper directly. Exporting the helper also lets it be barreled from `@workglow/huggingface-transformers/ai-runtime` so tests can exercise the shape validation without spinning up a real transformers.js pipeline or mocking deep-path internals (avoids ESM module-mock portability between bun:test and vitest). The run-fn delegates to the helper; behavior is unchanged.

sroussey · 2026-05-15T16:36:12Z

Cherry-picked all 4 commits (ad6553b, a1ca800, 68ca1b1, 0a693f9) into claude/search-with-rerank-Rl4IX (PR #484) as bbbb76e, 839a257, 77391c0, efe16cf. The setAiStrategy trust-model pointer was merged into the existing method (its location had moved on the target branch since this PR was authored, so the rebased version added it next to the already-present method rather than reintroducing a duplicate block). Closing as integrated.

Generated by Claude Code

claude and others added 13 commits May 11, 2026 07:12

docs(kb): point KnowledgeBase and setAiStrategy to strategy trust model

a1ca800

One-sentence pointer added on the class TSDoc and on setAiStrategy so callers installing a strategy are reminded to read the trust model documented on IKbAiStrategy / IKbStrategyTarget.

sroussey force-pushed the claude/loving-mendel-bS9js-pr484 branch from 505100d to d0b2f18 Compare May 15, 2026 01:03

sroussey mentioned this pull request May 15, 2026

security/correctness fixes from automated review (High) #506

Merged

5 tasks

sroussey closed this May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kb,hft): document strategy trust model; runtime shape guard on HFT_TextReranker#501

fix(kb,hft): document strategy trust model; runtime shape guard on HFT_TextReranker#501
sroussey wants to merge 13 commits into
claude/loving-mendel-bS9js-pr484from
claude/loving-mendel-Hcznj-pr484

sroussey commented May 14, 2026

Uh oh!

sroussey commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sroussey commented May 14, 2026

Why

H1 — IKbStrategyTarget is an undocumented trusted-code boundary

H2 — HFT_TextReranker cast was loose and accessed .score without validation

What

Stacking

Verification

Defense-in-depth (deferred)

Risks

Rollback

Uh oh!

sroussey commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

H1 — `IKbStrategyTarget` is an undocumented trusted-code boundary

H2 — `HFT_TextReranker` cast was loose and accessed `.score` without validation