fix(kb,hft): document strategy trust model; runtime shape guard on HFT_TextReranker#501
Closed
sroussey wants to merge 13 commits into
Closed
fix(kb,hft): document strategy trust model; runtime shape guard on HFT_TextReranker#501sroussey wants to merge 13 commits into
sroussey wants to merge 13 commits into
Conversation
The KB now owns its `docEmbeddingModel`, `queryEmbeddingModel`, and `rerankerModel` IDs directly. Callers go through `kb.search()` / `kb.searchWithRerank()` / `kb.upsertDocumentWithIndex()` without threading models — the KB delegates to an installed `IKbAiStrategy` (typically `createAiKbStrategy(kb)` from `@workglow/ai`). This replaces the `onSearch` / `onDocumentUpsert` / `onDocumentDelete` callback soup with a single typed strategy interface. RAG callers no longer have to wire model IDs through every retrieval call site, and the KB exposes the search "kind" (similarity / hybrid / rerank) as a single dispatcher. - knowledge-base: add `IKbAiStrategy`, store model IDs as readonly fields, add `search`/`searchWithRerank`/`upsertDocumentWithIndex`/ `reindex`; remove the three callback hooks - ai: add `TextRerankerTask` (AiTask) and `KbReindexTask`; simplify `KbSearchTask` (no model input — KB picks the kind); add `createAiKbStrategy(kb)` factory wiring HierarchicalChunker + TextEmbedding + TextReranker - huggingface-transformers: add `HFT_TextReranker` run-fn that loads cross-encoder text-classification pipelines (bge-reranker, ms-marco-MiniLM) and scores `[query, doc]` pairs - tests: replace KB callback suite with strategy suite https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex
…riven
Follow-up to the model-fields refactor. The KB's public RAG surface now
has exactly three methods — `upsert(doc)`, `delete(doc_id)`,
`search(query, opts)` — and they all delegate to an installed
`IKbAiStrategy`. The lower-level storage methods (upsertDocument,
upsertChunksBulk, similaritySearch, hybridSearch, etc.) stay on the
class as strategy-facing building blocks; subclasses can still
intercept them via virtual dispatch.
`IKbAiStrategy` itself is now high-level:
- `ingest(kb, doc)` — chunk + embed + write in one shot
- `delete(kb, doc_id)` — cascading delete by default
- `search(kb, query, opts)` — strategy picks the retrieval mode
The "search kind" parameter is gone. Mode is part of the KB's stored
config (new `searchMode` field, alongside `chunkStrategy`) and the
standard strategy reads it on every call.
- `createStandardKbStrategy({ chunker?, chunkStrategy?, searchMode? })`
replaces `createAiKbStrategy`. It reads model IDs + chunkStrategy +
searchMode from the KB at op time, so config changes take effect on
the next call without rewiring.
- `KbSearchTask` drops `kind`; just `{ knowledgeBase, query, topK?, filter? }`.
- KB no longer exposes `searchWithRerank` / `upsertDocumentWithIndex`
publicly; that orchestration lives in the strategy.
- Custom strategies (e.g. for per-tenant scoping in the builder) keep
working — they implement the same three methods and call the
building-block methods however they want. Builder uses
`ScopedKnowledgeBase` (which overrides the low-level methods) +
`createStandardKbStrategy()` on top — scope injection rides through
virtual dispatch without the strategy knowing.
- Tests updated to exercise the new strategy shape.
https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex
Reconcile this branch's KB redesign (collapsed upsert/delete/search API backed by IKbAiStrategy) with main's PR #478 (ITextIndex + BM25F + RRF hybrid search). Both designs land — they're complementary: - The strategy seam stays the only public RAG entry point. Callbacks on KnowledgeBase (onDocumentUpsert / onDocumentDelete / onSearch) introduced by #478 are dropped; IKbAiStrategy replaces them. - The text-index infrastructure from #478 lands intact: installTextIndex, reindexText, getTextIndex on KnowledgeBase; auto-sync to the index on chunk upsert; auto-remove on document / chunk delete; BM25Index + Tokenizer in @workglow/storage. - kb.hybridSearch is rewritten per #478: KB-layer RRF fusion over similaritySearch + the installed text index. New options (vectorWeight, rrfK, candidatePoolMultiplier); no scoreThreshold since RRF scores aren't comparable to cosine. - kb.textSearch (new) is the pure-FTS path. searchMode="text" in the standard strategy now calls it directly instead of a zero-vector hybrid hack. - kb.supportsHybridSearch reports whether a text index is installed (not whether the storage backend has its own hybrid implementation). - IKbStrategyTarget grows a textSearch method and the new hybridSearch signature so createStandardKbStrategy can call them. - createStandardKbStrategy: searchMode="hybrid" uses the new KB-layer RRF; "text" uses textSearch; "rerank" first-stage uses hybrid when a text index is installed, falls back to similarity otherwise; "similarity" unchanged. https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex
- createStandardKbStrategy.ingest: drop the duplicate upsertDocument when doc.doc_id is initially missing. upsertDocument already returns the stored doc with the assigned id, so a second write was both wasted I/O and a hidden trap for ScopedKnowledgeBase overrides that do scope-injection work on every write. - createStandardKbStrategy: rewrite the firstStageMultiplier docstring to match the implementation (no 20-floor — just `topK * multiplier` with a `topK` floor so we never return fewer than the caller asked for). - IKbStrategyTarget.upsertChunksBulk: tighten the return type from Promise<unknown> to Promise<ChunkVectorEntity[]> so strategies don't have to cast. - ONNXModelSamples: add "TextRerankerTask" to bge-reranker-base's `tasks` array. The new standard strategy invokes TextRerankerTask (not the legacy heuristic RerankerTask), and AiTask.narrowInput resolves models by task tag — without this, rerank mode silently fails model resolution. - RerankerTask + TextRerankerTask: rewrite the JSDoc comments that referenced the now-removed `createAiKbStrategy` and `kb.searchWithRerank()`. Both now point readers at `createStandardKbStrategy` and the `searchMode: "rerank"` integration point. All 191 rag tests pass. https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex
…eType Address libs PR #484 review plan in one sweep, ordered by blast radius. Data integrity — ingest now delete-then-upsert-then-insert: - createStandardKbStrategy.ingest deletes old chunks BEFORE rewriting the document when doc.doc_id is set, so a partial failure (e.g. upsertChunksBulk rejecting) leaves "doc row preserved, chunks removed" rather than "new doc row pointing at stale old chunks." When doc.doc_id is unset, upserts first to mint the id, then runs deleteChunksForDocument as a defensive no-op so the post-condition ("after ingest, doc owns exactly the new chunks") holds even on storage backends that recycle ids. Concurrency — strategy snapshot per public op: - KnowledgeBase.upsert/delete/search/reindex now make the snapshot explicit (const strategy = this.requireStrategy(...)) and document the semantics on setAiStrategy: replacing the strategy does NOT affect ops already in flight; each public op resolves its strategy at entry. reindex captures once and uses the same strategy for the whole loop. Search correctness — surface canonical text + drop JSON.stringify fallback: - New chunkText(c) helper on @workglow/knowledge-base reads metadata.text and throws (with the chunk_id) when missing. Replaces the inline `meta?.text … JSON.stringify(meta ?? {})` map in createStandardKbStrategy and KbSearchTask. Documents metadata.text as a load-bearing contract on InsertChunkVectorEntity. Score semantics — tag rerank with scoreType: "rerank": - ScoreType union extended with "rerank". Both cross-encoder and heuristic-fallback rerank paths now set scoreType: "rerank" as const, overriding the first-stage cosine/RRF tag. Docstrings on createStandardKbStrategy, IKbAiStrategy.search, and the ScoreType union itself flag that cross-encoder logits are NOT comparable to cosine/BM25/RRF scores; callers must inspect scoreType before applying a score threshold. scoreThreshold is intentionally not honored in the rerank branch (commented in-code). ChunkRetrievalTask output schema enum extended to keep parity with the canonical union; the task itself only emits cosine/rrf. Tests: - mid-op setAiStrategy(B) during search/upsert/reindex: assert the in-flight op completes via the original strategy (DocumentRepository "strategy contract" block, 3 tests). - chunkText helper: present → returns text; missing → throws with chunk_id (DocumentRepository, 2 tests). - KbSearchTask: result with metadata lacking text throws with chunk_id rather than emitting JSON.stringify (KbSearchTask.test, 1 test). - New KnowledgeBaseStandardStrategy.test.ts exercises the actual createStandardKbStrategy: ingest-order partial-failure leaves no orphan chunks; rerank heuristic-fallback tags results "rerank". Setup registers a stub TextEmbeddingTask runFn + model record so the strategy's embedTexts call resolves without real providers. All new tests + KbSearchTask + DocumentRepository tests pass (191 rag tests pass, the 7 remaining failures are pre-existing HuggingFace-503 flakes in EndToEnd / RagWorkflow integration tests). https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex
…/* + examples/cli The previous publish list was a hand-maintained subset (util, storage, job-queue, task-graph, knowledge-base, tasks, ai, ai-provider, workglow, examples/cli) and referenced a defunct `ai-provider` directory. As a result, the PR-preview install URLs for every vendor package (anthropic, openai, ollama, huggingface-*, sqlite, postgres, supabase, mcp, indexeddb, javascript, browser-control, etc.) 404'd, blocking downstream repos that want to consume a libs PR via overrides. Replace with `./packages/* ./providers/* ./examples/cli` so the shell expands to every workspace at exec time. `pkg-pr-new` honors the `"private": true` flag on `@workglow/test`, so internal-only workspaces are skipped automatically — no need to enumerate. https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex
…nfig threading Brings in four main commits: - fix(storage-migrations): serialize concurrent runs + SQLite rollback (#485) - fix(knowledge-base,storage,postgres): cross-KB getBulk isolation + PostgresFTS index (#486) - perf(build): turbo task graph + TS project references (#489) - feat(ai,task-graph): thread runConfig through CreateWorkflow and AI wrappers (#490) Conflict resolutions: - KnowledgeBase.ts: keep reindexText hooks (beginRebuild/commitRebuild/abortRebuild) and fire-and-forget Promise handling in syncTextIndexForChunk from #486; keep our strategy-driven public API (upsert/delete/search) and chunkStrategy/searchMode fields - ChunkRetrievalTask.ts: keep explicit scoreType annotation "cosine"|"bm25"|"rrf"|"rerank" https://claude.ai/code/session_01Ya54WFZhpDFzAqRh1qG8Ex
The rerank-mode first-stage size was computed as `Math.max(topK * firstStageMultiplier, topK)`, which is a no-op since `topK * mult >= topK` whenever `mult >= 1`. The intended floor was a fixed minimum (commented as such), so very small `topK` (e.g. topK=1, mult=5 -> 5 candidates) silently collapsed the reranker's input down to a handful of candidates with no real choice to make. Add a new `firstStageMinimum` option to `CreateStandardKbStrategyOptions` (default 20) and use it as the actual floor: `Math.max(topK * firstStageMultiplier, firstStageMinimum)`. Update JSDoc on `firstStageMultiplier` and the new `firstStageMinimum` to describe how they interact. Adds a vitest suite that spies on `hybridSearch` / `similaritySearch` and asserts the first-stage `topK` value forwarded to them across representative inputs.
The input schema accepted no `scoreThreshold`, and `execute` did not
destructure or forward one either, so any threshold supplied via the
task surface was silently dropped before reaching `kb.search`. Callers
wiring a threshold through a workflow would get unfiltered results
with no warning.
Add `scoreThreshold` to the input schema (number, minimum 0) and
forward it in the call to `kb.search(query, { topK, filter,
scoreThreshold })`. Note that the standard strategy still ignores
the threshold in rerank mode by design (cross-encoder logits aren't
on the same scale as cosine/RRF scores) — that contract is documented
in `createStandardKbStrategy` and the new schema description.
Adds a vitest suite that spies on `kb.search` and asserts the
threshold is forwarded when provided and absent (undefined) when
omitted.
…egyTarget Strategies receive the KB's full low-level storage surface and can bypass ScopedKnowledgeBase virtual-dispatch scoping. Documents this explicitly so operators do not load strategies from untrusted sources, user input, or remote registries. Also adds pointer TSDoc on KnowledgeBase and setAiStrategy referencing the new trust model paragraph.
One-sentence pointer added on the class TSDoc and on setAiStrategy so callers installing a strategy are reminded to read the trust model documented on IKbAiStrategy / IKbStrategyTarget.
…on mismatch HFT_TextReranker used `as unknown as (...) => Promise<...>` to coerce the transformers.js pipeline, then read `.score` without runtime validation. A pipeline that returned a different shape (e.g. missing scores, wrong field names) would silently produce garbage scores instead of failing loudly. - Added KbRerankerOutputError in @workglow/ai's TextRerankerTask module so callers can `instanceof`-test from the public barrel. - Replaced the cast with a narrow unknown-typed local and an `isScored` guard. On mismatch we throw KbRerankerOutputError including the model path and a truncated shape snippet for diagnostics. - Removed the silent `?? 0` fallback so missing scores fail loudly instead of returning 0. - Added a shape-validation test that mocks getPipeline and exercises (a) total shape mismatch, (b) the valid mixed object/array-of-objects case from `top_k > 1`, (c) a partial mismatch within an otherwise valid batch.
Refactors the runtime shape guard out of `HFT_TextReranker` into an exported pure helper `validateAndExtractRerankerScores`, then tests that helper directly. Exporting the helper also lets it be barreled from `@workglow/huggingface-transformers/ai-runtime` so tests can exercise the shape validation without spinning up a real transformers.js pipeline or mocking deep-path internals (avoids ESM module-mock portability between bun:test and vitest). The run-fn delegates to the helper; behavior is unchanged.
505100d to
d0b2f18
Compare
5 tasks
Collaborator
Author
|
Cherry-picked all 4 commits ( Generated by Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Two High-severity findings from a security review of PR #484 (the KB strategy refactor).
H1 —
IKbStrategyTargetis an undocumented trusted-code boundaryAn installed
IKbAiStrategyreceives anIKbStrategyTargetthat exposes the KB's full low-level storage surface (upsertDocument,deleteChunksForDocument,upsertChunksBulk,similaritySearch,hybridSearch). These operations bypass any application-level access control because scoping is implemented via virtual dispatch on the target instance — a malicious or buggy strategy can violate the scoping contract by routing data through a different KB. This trusted-code boundary needs to be documented so operators do not load strategies from untrusted sources, user input, or remote registries.H2 —
HFT_TextRerankercast was loose and accessed.scorewithout validationA transformers.js pipeline producing a different shape (mismatched task, wrong field names, null entries) would silently return garbage scores rather than fail loudly.
What
IKbAiStrategyandIKbStrategyTargetspelling out that strategies are TRUSTED CODE and must not be loaded from untrusted sources.KnowledgeBaseandsetAiStrategylinking back to the trust model.as unknown ascast with a runtimeisScoredshape guard.KbRerankerOutputError(co-located withTextRerankerTaskin@workglow/ai, exported from the package barrel so callersinstanceof-test).validateAndExtractRerankerScores(rawResults, modelPath): number[]— pure helper, no side effects, fully unit-testable. The run-fn delegates to it.?? 0fallback so a missing score now throwsKbRerankerOutputErrorwith the model path and a truncated shape snippet.packages/test/src/test/rag/HFT_TextReranker.shape.test.ts.HFT_TextReranker(and the new helper) from@workglow/huggingface-transformers/ai-runtimeso tests can reach them without deep-path imports.Stacking
This PR is stacked: its base is
claude/loving-mendel-bS9js-pr484(the head of PR #496) so #496's fixes (rerank firstStageMinimum floor +KbSearchTaskscoreThresholdforwarding) are preserved. Merge order: #484 → #496 → this PR.Verification
bun run build:types— confirm the new error class + helper typecheck cleanly across@workglow/aiand@workglow/huggingface-transformers.cd packages/test && bun test src/test/rag/HFT_TextReranker.shape.test.ts— exercise the shape guard (3 negative cases + valid mixed-form case + diagnostic-payload case).cd packages/test && bun test src/test/rag/— regression check on the rest of the RAG suite.Defense-in-depth (deferred)
A Proxy wrapper that enforces the KB's scope on every strategy-target call was considered (intercept
upsertDocument/similaritySearch/ etc. on the proxy and refuse calls that try to escape the installedScopedKnowledgeBase). Deferred to a follow-up issue:ScopedKnowledgeBasevirtual-dispatch already provides the primary boundary;Risks
getPipelinestubs elsewhere in the codebase may currently return shapes the old code silently absorbed via?? 0. They will now throwKbRerankerOutputErrorinstead. Grep fortext-classificationpipeline stubs before merging.HFT_TextRerankerto theai-runtimebarrel grows the runtime surface by one named export; the symbol is small and benign.Rollback
Two independent commit groups so each H can be reverted on its own:
docs(kb): …— H1 (TSDoc-only on KB types; no runtime change).fix(hft): …/test(hft): …— H2 (runtime guard + test).https://claude.ai/code/session_01LkKCHC8q6b5e7Nwvc8L3M6
Generated by Claude Code