fix(chrome-ai): three HIGH-priority bugs in PR #514 — chat cache, prototype pollution, snapshot reset#528
Merged
Conversation
The previous fingerprint-based cache key recomputed the fingerprint from the *prior* history on every turn, so turn 2's cache lookup always missed and rebuilt the session from scratch. Switch to a messageCount high-water mark: cache hits when cached.messageCount === lastUserIdx (i.e., the session has already heard everything before the trailing user message). After a successful turn the session has heard messages.length + 1 messages (history + new assistant reply), which we record for the next call.
…lution (HIGH)
Many tool input schemas don't set `additionalProperties: false`, so a
hallucinated `{__proto__: {polluted: true}, ok: true}` payload would
pass validation and propagate through to consumers. Add a
`sanitizeToolArgs` helper that recursively rebuilds the value with a
plain Object.prototype, dropping `__proto__`, `constructor`, and
`prototype` keys at every depth. Sanitize BEFORE validation so the
validator sees the cleaned object.
`snapshotStreamToTextDeltas` was concatenating instead of resetting when a snapshot was not a prefix-extension of the previously accumulated text. For self-correction snapshots (Chrome replacing, not extending, prior text) this corrupted consumer state with duplicated content like `"hello worldhello sailor"`. Reset the accumulator to the new snapshot and emit it as the delta so consumers treat the non-prefix boundary as a replace, matching the documented streaming-convention exception. Also add `snapshotStreamToTextDeltas` to `_testOnly` so the helper is testable from the test package, and add coverage for: - HIGH-1: chat cache reuse and rebuild-on-divergence - HIGH-2: __proto__/constructor/prototype scrubbing (top-level + recursive) - HIGH-3: prefix-extend, non-prefix-reset, identical-snapshot semantics Also fix a stale comment in the existing tool-calling lifecycle test that claimed cache reuse — tool-calling intentionally rebuilds per turn.
@workglow/cli
@workglow/ai
@workglow/browser-control
@workglow/indexeddb
@workglow/javascript
@workglow/job-queue
@workglow/knowledge-base
@workglow/mcp
@workglow/storage
@workglow/task-graph
@workglow/tasks
@workglow/util
workglow
@workglow/anthropic
@workglow/bun-webview
@workglow/chrome-ai
@workglow/electron
@workglow/google-gemini
@workglow/huggingface-inference
@workglow/huggingface-transformers
@workglow/node-llama-cpp
@workglow/ollama
@workglow/openai
@workglow/playwright
@workglow/postgres
@workglow/sqlite
@workglow/supabase
@workglow/tf-mediapipe
commit: |
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes three high-priority issues in the Chrome AI provider: (1) chat session cache reuse, (2) prototype-pollution hardening for tool-call arguments, and (3) corrected handling of non-prefix streaming snapshots.
Changes:
- Reworked
WebBrowser_Chatcache reuse to use amessageCountwatermark instead of a history fingerprint. - Added recursive sanitization of tool-call args to drop
__proto__/constructor/prototypekeys before JSON-schema validation. - Adjusted
snapshotStreamToTextDeltasto reset internal state on non-prefix snapshots and added targeted tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| providers/chrome-ai/src/ai/index.ts | Exposes snapshotStreamToTextDeltas under _testOnly for unit tests. |
| providers/chrome-ai/src/ai/common/WebBrowser_ToolCalling.ts | Adds sanitizeToolArgs and runs it before schema validation/capture. |
| providers/chrome-ai/src/ai/common/WebBrowser_Sessions.ts | Documents/adjusts cached session state to emphasize messageCount reuse semantics. |
| providers/chrome-ai/src/ai/common/WebBrowser_ChromeHelpers.ts | Updates snapshot→delta conversion logic and documentation for non-prefix snapshots. |
| providers/chrome-ai/src/ai/common/WebBrowser_Chat.ts | Fixes chat cache reuse check and updates cache writes to store messageCount. |
| packages/test/src/test/ai-provider/WebBrowserProvider.test.ts | Adds tests for chat cache reuse, tool-arg sanitization, and snapshot reset behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+167
to
171
| // delta. Consumers reconstructing full text by concatenation should | ||
| // treat any subsequent non-prefix delta as a reset boundary; use | ||
| // `snapshotStreamToSnapshots` if you need replace-mode semantics. | ||
| accumulatedText = value; | ||
| yield { type: "text-delta", port, textDelta: value }; |
Comment on lines
+866
to
+870
| // Now simulate a retroactive history mutation: a NEW user message at | ||
| // index 0 (so lastUserIdx becomes 2, not 1). Cache's messageCount=2 | ||
| // != expectedPriorCount=2? Actually equal here, so try shrinking. | ||
| // Shrink: a single user message again. lastUserIdx=0, but cache | ||
| // says messageCount=2 → mismatch → rebuild. |
Coverage Report
File CoverageNo changed files found. |
sroussey
added a commit
that referenced
this pull request
May 22, 2026
…apability probe Integrates the chrome-ai branch (7 commits — PR #514/#520/#528) with main's parallel chrome-ai work (model.download, model.dispose, ApiBinding): - Chat-session cache keyed by AiChatTask sessionId, with messageCount high-water mark for reuse (replaces fingerprint-based invalidation) - StructuredGeneration + ToolCalling run-fns gated by an async capability probe; pre-probe state advertises a conservative subset (no json-mode, no tool-use) so the provider never claims a capability it can't fulfil - ChatHistory helpers + WebBrowser_TextGeneration_Unified dispatcher (text.generation shared by AiChatTask + TextGenerationTask) - ChromeHelpers ships both assertAvailability and ensureAvailable; both session APIs (chrome-chat cache + idle-evict store) coexist - Drops main's WebBrowser_Chat.test.ts (chrome-ai's WebBrowserProvider.test already covers chat behavior under the new cache semantics)
sroussey
added a commit
that referenced
this pull request
May 22, 2026
…viders Addresses review of #514/#520/#528 rebase: CRITICAL fix — `model.dispose` now reaches chat-cached sessions. The post-rebase chrome-ai branch had two parallel session maps (`chromeSessions` for chat reuse, `sessions` for idle-evict + ModelDispose lookup) but only the chat map was populated by runtime code, making `model.dispose` a functional no-op in production. Unified into a single Map<sessionId, WebBrowserSessionEntry> with both chat-cache fields (messageCount, fingerprints) and lifecycle fields (modelKey, lastUsedAt, idleTimer). `ChromeChatSessionState` now requires `modelKey`. `disposeWebBrowserSessionsForModel(modelKey)` iterates the unified store, so model.dispose destroys chat-cached sessions. Chat sessions become subject to idle eviction (free bonus). IMPORTANT — sanitizeToolArgs applied across the codebase per intent of the prior refactor: - OpenAIShapedChat (parseOpenAIToolCallMessage + accumulateOpenAIStream) → covers OpenAI + HFI - ToolCallParsers (adaptParserResult + parseToolCallsFromText) → covers llama.cpp Hermes/Liquid/Qwen35/Llama paths + HFT - Anthropic_ToolCalling (input_json_delta + content_block_stop) - Gemini_ToolCalling (functionCall.args) - Ollama_ToolCalling (parsed function.arguments) - LlamaCpp_ToolCalling (extractNativeFunctionCalls) - Cactus_ToolCalling[.browser] (JSON-parse parseToolCalls paths) Every model-supplied tool-arg payload now passes through sanitizeToolArgs before reaching downstream consumers, closing the prototype-pollution vector across the provider matrix. Also: - Added packages/test/src/test/ai/ToolCallingUtils.test.ts (14 unit tests for sanitizeToolArgs, compileToolValidators, validateToolCallArgs, plus a sanitize→validate→name-check integration test). - Added WebBrowser_Sessions.test regression for the unified-store behavior (disposeWebBrowserSessionsForModel sees chat-cached entries). - Documented WebBrowser_Chat's rebuild-on-next-turn recovery model (vs the in-fn retry that main's now-deleted test exercised).
sroussey
added a commit
that referenced
this pull request
May 22, 2026
* feat(chrome-ai): chat history, tool calling, structured generation, capability probe Integrates the chrome-ai branch (7 commits — PR #514/#520/#528) with main's parallel chrome-ai work (model.download, model.dispose, ApiBinding): - Chat-session cache keyed by AiChatTask sessionId, with messageCount high-water mark for reuse (replaces fingerprint-based invalidation) - StructuredGeneration + ToolCalling run-fns gated by an async capability probe; pre-probe state advertises a conservative subset (no json-mode, no tool-use) so the provider never claims a capability it can't fulfil - ChatHistory helpers + WebBrowser_TextGeneration_Unified dispatcher (text.generation shared by AiChatTask + TextGenerationTask) - ChromeHelpers ships both assertAvailability and ensureAvailable; both session APIs (chrome-chat cache + idle-evict store) coexist - Drops main's WebBrowser_Chat.test.ts (chrome-ai's WebBrowserProvider.test already covers chat behavior under the new cache semantics) * refactor(ai,chrome-ai,openai,hfi): shared tool sanitation; emit-pattern streams Tool calling utilities (packages/ai/src/task/ToolCallingUtils.ts): - sanitizeToolArgs: recursive __proto__/constructor/prototype scrubbing for model-supplied tool args (prototype-pollution defence) - compileToolValidators + validateToolCallArgs: per-tool inputSchema validation with graceful fallback for tools whose schema fails to compile Stream helpers converted from generators to emit-callback so run-fns no longer need a for-await/yield pump: - snapshotStreamToTextDeltas / snapshotStreamToSnapshots (chrome-ai) - accumulateOpenAIStream (@workglow/ai provider-utils, used by OpenAI + HFI) Run-fns updated to call helpers with emit directly and emit their own final 'finish' event. chrome-ai's WebBrowser_ToolCalling drops its private sanitization + validation copy and reuses the shared utils. * fix(chrome-ai): wire model.dispose; apply sanitizeToolArgs across providers Addresses review of #514/#520/#528 rebase: CRITICAL fix — `model.dispose` now reaches chat-cached sessions. The post-rebase chrome-ai branch had two parallel session maps (`chromeSessions` for chat reuse, `sessions` for idle-evict + ModelDispose lookup) but only the chat map was populated by runtime code, making `model.dispose` a functional no-op in production. Unified into a single Map<sessionId, WebBrowserSessionEntry> with both chat-cache fields (messageCount, fingerprints) and lifecycle fields (modelKey, lastUsedAt, idleTimer). `ChromeChatSessionState` now requires `modelKey`. `disposeWebBrowserSessionsForModel(modelKey)` iterates the unified store, so model.dispose destroys chat-cached sessions. Chat sessions become subject to idle eviction (free bonus). IMPORTANT — sanitizeToolArgs applied across the codebase per intent of the prior refactor: - OpenAIShapedChat (parseOpenAIToolCallMessage + accumulateOpenAIStream) → covers OpenAI + HFI - ToolCallParsers (adaptParserResult + parseToolCallsFromText) → covers llama.cpp Hermes/Liquid/Qwen35/Llama paths + HFT - Anthropic_ToolCalling (input_json_delta + content_block_stop) - Gemini_ToolCalling (functionCall.args) - Ollama_ToolCalling (parsed function.arguments) - LlamaCpp_ToolCalling (extractNativeFunctionCalls) - Cactus_ToolCalling[.browser] (JSON-parse parseToolCalls paths) Every model-supplied tool-arg payload now passes through sanitizeToolArgs before reaching downstream consumers, closing the prototype-pollution vector across the provider matrix. Also: - Added packages/test/src/test/ai/ToolCallingUtils.test.ts (14 unit tests for sanitizeToolArgs, compileToolValidators, validateToolCallArgs, plus a sanitize→validate→name-check integration test). - Added WebBrowser_Sessions.test regression for the unified-store behavior (disposeWebBrowserSessionsForModel sees chat-cached entries). - Documented WebBrowser_Chat's rebuild-on-next-turn recovery model (vs the in-fn retry that main's now-deleted test exercised). * feat(chrome-ai): retry once on InvalidStateError when a cached session is destroyed Chrome can destroy a `LanguageModel` session out from under us (tab backgrounding, GPU process restart, memory pressure). When a cached session's `promptStreaming` throws DOMException("...destroyed...", "InvalidStateError") we now rebuild the session from full history via `initialPrompts` and retry the prompt once. Retry is gated on three conditions, all required: - We were using a CACHED session (a fresh-session failure means the model is broken; retrying won't help). - No text-delta has reached the consumer yet (we can't unsend deltas). - The error name is `InvalidStateError` (matches Chrome's InvalidStateError DOMException; tolerant of message-text changes). Tests: - "retries once with a fresh session when a cached session is destroyed" seeds the cache on turn 1, has the cached session's promptStreaming throw on turn 2's reuse, asserts rebuild + retry + cache replacement. - "does not retry when a fresh (non-cached) session fails" guards the first gate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes 3 HIGH-priority findings in #514.
Summary
WebBrowser_Chatsession cache never reuses. StoredhistoryFingerprintwas computed over the prior-prior state on turn 1, so turn 2's lookup always missed and rebuilt the session from scratch. Switched to amessageCount-based watermark — cache hits whencached.messageCount === lastUserIdx. After each turn,messageCount = messages.length + 1.__proto__keys past validation. Many tool input schemas don't setadditionalProperties: false, so{__proto__: {polluted: true}, ok: true}passes validation. Added asanitizeToolArgshelper that recursively drops__proto__/constructor/prototypekeys before validation.snapshotStreamToTextDeltascorrupts text on non-prefix snapshots. Non-prefix branch was concatenating instead of resetting. Reset-and-emit now correctly handles self-correction snapshots.Test plan
WebBrowserProvider.test.ts— new tests for cache reuse, sanitization (incl. recursive), and snapshot resetGenerated by Claude Code