Skip to content

fix(chrome-ai): three HIGH-priority bugs in PR #514 — chat cache, prototype pollution, snapshot reset#528

Merged
sroussey merged 4 commits into
chrome-aifrom
claude/sweet-edison-NSph2-chrome-ai
May 22, 2026
Merged

fix(chrome-ai): three HIGH-priority bugs in PR #514 — chat cache, prototype pollution, snapshot reset#528
sroussey merged 4 commits into
chrome-aifrom
claude/sweet-edison-NSph2-chrome-ai

Conversation

@sroussey
Copy link
Copy Markdown
Collaborator

Fixes 3 HIGH-priority findings in #514.

Summary

  • HIGH-1: WebBrowser_Chat session cache never reuses. Stored historyFingerprint was computed over the prior-prior state on turn 1, so turn 2's lookup always missed and rebuilt the session from scratch. Switched to a messageCount-based watermark — cache hits when cached.messageCount === lastUserIdx. After each turn, messageCount = messages.length + 1.
  • HIGH-2: Tool-call arguments allow __proto__ keys past validation. Many tool input schemas don't set additionalProperties: false, so {__proto__: {polluted: true}, ok: true} passes validation. Added a sanitizeToolArgs helper that recursively drops __proto__ / constructor / prototype keys before validation.
  • HIGH-3: snapshotStreamToTextDeltas corrupts text on non-prefix snapshots. Non-prefix branch was concatenating instead of resetting. Reset-and-emit now correctly handles self-correction snapshots.

Test plan

  • WebBrowserProvider.test.ts — new tests for cache reuse, sanitization (incl. recursive), and snapshot reset
  • All existing tests in the chrome-ai suite still pass

Generated by Claude Code

sroussey added 3 commits May 22, 2026 01:27
The previous fingerprint-based cache key recomputed the fingerprint
from the *prior* history on every turn, so turn 2's cache lookup always
missed and rebuilt the session from scratch. Switch to a messageCount
high-water mark: cache hits when cached.messageCount === lastUserIdx
(i.e., the session has already heard everything before the trailing
user message). After a successful turn the session has heard
messages.length + 1 messages (history + new assistant reply), which we
record for the next call.
…lution (HIGH)

Many tool input schemas don't set `additionalProperties: false`, so a
hallucinated `{__proto__: {polluted: true}, ok: true}` payload would
pass validation and propagate through to consumers. Add a
`sanitizeToolArgs` helper that recursively rebuilds the value with a
plain Object.prototype, dropping `__proto__`, `constructor`, and
`prototype` keys at every depth. Sanitize BEFORE validation so the
validator sees the cleaned object.
`snapshotStreamToTextDeltas` was concatenating instead of resetting
when a snapshot was not a prefix-extension of the previously
accumulated text. For self-correction snapshots (Chrome replacing,
not extending, prior text) this corrupted consumer state with
duplicated content like `"hello worldhello sailor"`. Reset the
accumulator to the new snapshot and emit it as the delta so consumers
treat the non-prefix boundary as a replace, matching the documented
streaming-convention exception.

Also add `snapshotStreamToTextDeltas` to `_testOnly` so the helper
is testable from the test package, and add coverage for:
  - HIGH-1: chat cache reuse and rebuild-on-divergence
  - HIGH-2: __proto__/constructor/prototype scrubbing (top-level + recursive)
  - HIGH-3: prefix-extend, non-prefix-reset, identical-snapshot semantics

Also fix a stale comment in the existing tool-calling lifecycle test
that claimed cache reuse — tool-calling intentionally rebuilds per
turn.
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 22, 2026

Open in StackBlitz

@workglow/cli

npm i https://pkg.pr.new/@workglow/cli@528

@workglow/ai

npm i https://pkg.pr.new/@workglow/ai@528

@workglow/browser-control

npm i https://pkg.pr.new/@workglow/browser-control@528

@workglow/indexeddb

npm i https://pkg.pr.new/@workglow/indexeddb@528

@workglow/javascript

npm i https://pkg.pr.new/@workglow/javascript@528

@workglow/job-queue

npm i https://pkg.pr.new/@workglow/job-queue@528

@workglow/knowledge-base

npm i https://pkg.pr.new/@workglow/knowledge-base@528

@workglow/mcp

npm i https://pkg.pr.new/@workglow/mcp@528

@workglow/storage

npm i https://pkg.pr.new/@workglow/storage@528

@workglow/task-graph

npm i https://pkg.pr.new/@workglow/task-graph@528

@workglow/tasks

npm i https://pkg.pr.new/@workglow/tasks@528

@workglow/util

npm i https://pkg.pr.new/@workglow/util@528

workglow

npm i https://pkg.pr.new/workglow@528

@workglow/anthropic

npm i https://pkg.pr.new/@workglow/anthropic@528

@workglow/bun-webview

npm i https://pkg.pr.new/@workglow/bun-webview@528

@workglow/chrome-ai

npm i https://pkg.pr.new/@workglow/chrome-ai@528

@workglow/electron

npm i https://pkg.pr.new/@workglow/electron@528

@workglow/google-gemini

npm i https://pkg.pr.new/@workglow/google-gemini@528

@workglow/huggingface-inference

npm i https://pkg.pr.new/@workglow/huggingface-inference@528

@workglow/huggingface-transformers

npm i https://pkg.pr.new/@workglow/huggingface-transformers@528

@workglow/node-llama-cpp

npm i https://pkg.pr.new/@workglow/node-llama-cpp@528

@workglow/ollama

npm i https://pkg.pr.new/@workglow/ollama@528

@workglow/openai

npm i https://pkg.pr.new/@workglow/openai@528

@workglow/playwright

npm i https://pkg.pr.new/@workglow/playwright@528

@workglow/postgres

npm i https://pkg.pr.new/@workglow/postgres@528

@workglow/sqlite

npm i https://pkg.pr.new/@workglow/sqlite@528

@workglow/supabase

npm i https://pkg.pr.new/@workglow/supabase@528

@workglow/tf-mediapipe

npm i https://pkg.pr.new/@workglow/tf-mediapipe@528

commit: fb1bfd0

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes three high-priority issues in the Chrome AI provider: (1) chat session cache reuse, (2) prototype-pollution hardening for tool-call arguments, and (3) corrected handling of non-prefix streaming snapshots.

Changes:

  • Reworked WebBrowser_Chat cache reuse to use a messageCount watermark instead of a history fingerprint.
  • Added recursive sanitization of tool-call args to drop __proto__ / constructor / prototype keys before JSON-schema validation.
  • Adjusted snapshotStreamToTextDeltas to reset internal state on non-prefix snapshots and added targeted tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
providers/chrome-ai/src/ai/index.ts Exposes snapshotStreamToTextDeltas under _testOnly for unit tests.
providers/chrome-ai/src/ai/common/WebBrowser_ToolCalling.ts Adds sanitizeToolArgs and runs it before schema validation/capture.
providers/chrome-ai/src/ai/common/WebBrowser_Sessions.ts Documents/adjusts cached session state to emphasize messageCount reuse semantics.
providers/chrome-ai/src/ai/common/WebBrowser_ChromeHelpers.ts Updates snapshot→delta conversion logic and documentation for non-prefix snapshots.
providers/chrome-ai/src/ai/common/WebBrowser_Chat.ts Fixes chat cache reuse check and updates cache writes to store messageCount.
packages/test/src/test/ai-provider/WebBrowserProvider.test.ts Adds tests for chat cache reuse, tool-arg sanitization, and snapshot reset behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +167 to 171
// delta. Consumers reconstructing full text by concatenation should
// treat any subsequent non-prefix delta as a reset boundary; use
// `snapshotStreamToSnapshots` if you need replace-mode semantics.
accumulatedText = value;
yield { type: "text-delta", port, textDelta: value };
Comment on lines +866 to +870
// Now simulate a retroactive history mutation: a NEW user message at
// index 0 (so lastUserIdx becomes 2, not 1). Cache's messageCount=2
// != expectedPriorCount=2? Actually equal here, so try shrinking.
// Shrink: a single user message again. lastUserIdx=0, but cache
// says messageCount=2 → mismatch → rebuild.
@github-actions
Copy link
Copy Markdown

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 62.25% 22893 / 36773
🔵 Statements 62.13% 23683 / 38117
🔵 Functions 63.19% 4309 / 6819
🔵 Branches 50.77% 11078 / 21817
File CoverageNo changed files found.
Generated in workflow #2382 for commit fb1bfd0 by the Vitest Coverage Report Action

@sroussey sroussey merged commit 31d1abc into chrome-ai May 22, 2026
12 checks passed
@sroussey sroussey deleted the claude/sweet-edison-NSph2-chrome-ai branch May 22, 2026 15:44
sroussey added a commit that referenced this pull request May 22, 2026
…apability probe

Integrates the chrome-ai branch (7 commits — PR #514/#520/#528) with main's
parallel chrome-ai work (model.download, model.dispose, ApiBinding):

- Chat-session cache keyed by AiChatTask sessionId, with messageCount
  high-water mark for reuse (replaces fingerprint-based invalidation)
- StructuredGeneration + ToolCalling run-fns gated by an async capability
  probe; pre-probe state advertises a conservative subset (no json-mode,
  no tool-use) so the provider never claims a capability it can't fulfil
- ChatHistory helpers + WebBrowser_TextGeneration_Unified dispatcher
  (text.generation shared by AiChatTask + TextGenerationTask)
- ChromeHelpers ships both assertAvailability and ensureAvailable; both
  session APIs (chrome-chat cache + idle-evict store) coexist
- Drops main's WebBrowser_Chat.test.ts (chrome-ai's WebBrowserProvider.test
  already covers chat behavior under the new cache semantics)
sroussey added a commit that referenced this pull request May 22, 2026
…viders

Addresses review of #514/#520/#528 rebase:

CRITICAL fix — `model.dispose` now reaches chat-cached sessions. The
post-rebase chrome-ai branch had two parallel session maps
(`chromeSessions` for chat reuse, `sessions` for idle-evict +
ModelDispose lookup) but only the chat map was populated by runtime
code, making `model.dispose` a functional no-op in production.

Unified into a single Map<sessionId, WebBrowserSessionEntry> with both
chat-cache fields (messageCount, fingerprints) and lifecycle fields
(modelKey, lastUsedAt, idleTimer). `ChromeChatSessionState` now requires
`modelKey`. `disposeWebBrowserSessionsForModel(modelKey)` iterates the
unified store, so model.dispose destroys chat-cached sessions. Chat
sessions become subject to idle eviction (free bonus).

IMPORTANT — sanitizeToolArgs applied across the codebase per intent of
the prior refactor:

  - OpenAIShapedChat (parseOpenAIToolCallMessage + accumulateOpenAIStream)
    → covers OpenAI + HFI
  - ToolCallParsers (adaptParserResult + parseToolCallsFromText)
    → covers llama.cpp Hermes/Liquid/Qwen35/Llama paths + HFT
  - Anthropic_ToolCalling (input_json_delta + content_block_stop)
  - Gemini_ToolCalling (functionCall.args)
  - Ollama_ToolCalling (parsed function.arguments)
  - LlamaCpp_ToolCalling (extractNativeFunctionCalls)
  - Cactus_ToolCalling[.browser] (JSON-parse parseToolCalls paths)

Every model-supplied tool-arg payload now passes through
sanitizeToolArgs before reaching downstream consumers, closing the
prototype-pollution vector across the provider matrix.

Also:
  - Added packages/test/src/test/ai/ToolCallingUtils.test.ts (14 unit
    tests for sanitizeToolArgs, compileToolValidators,
    validateToolCallArgs, plus a sanitize→validate→name-check
    integration test).
  - Added WebBrowser_Sessions.test regression for the unified-store
    behavior (disposeWebBrowserSessionsForModel sees chat-cached
    entries).
  - Documented WebBrowser_Chat's rebuild-on-next-turn recovery model
    (vs the in-fn retry that main's now-deleted test exercised).
sroussey added a commit that referenced this pull request May 22, 2026
* feat(chrome-ai): chat history, tool calling, structured generation, capability probe

Integrates the chrome-ai branch (7 commits — PR #514/#520/#528) with main's
parallel chrome-ai work (model.download, model.dispose, ApiBinding):

- Chat-session cache keyed by AiChatTask sessionId, with messageCount
  high-water mark for reuse (replaces fingerprint-based invalidation)
- StructuredGeneration + ToolCalling run-fns gated by an async capability
  probe; pre-probe state advertises a conservative subset (no json-mode,
  no tool-use) so the provider never claims a capability it can't fulfil
- ChatHistory helpers + WebBrowser_TextGeneration_Unified dispatcher
  (text.generation shared by AiChatTask + TextGenerationTask)
- ChromeHelpers ships both assertAvailability and ensureAvailable; both
  session APIs (chrome-chat cache + idle-evict store) coexist
- Drops main's WebBrowser_Chat.test.ts (chrome-ai's WebBrowserProvider.test
  already covers chat behavior under the new cache semantics)

* refactor(ai,chrome-ai,openai,hfi): shared tool sanitation; emit-pattern streams

Tool calling utilities (packages/ai/src/task/ToolCallingUtils.ts):
- sanitizeToolArgs: recursive __proto__/constructor/prototype scrubbing
  for model-supplied tool args (prototype-pollution defence)
- compileToolValidators + validateToolCallArgs: per-tool inputSchema
  validation with graceful fallback for tools whose schema fails to compile

Stream helpers converted from generators to emit-callback so run-fns no
longer need a for-await/yield pump:
- snapshotStreamToTextDeltas / snapshotStreamToSnapshots (chrome-ai)
- accumulateOpenAIStream (@workglow/ai provider-utils, used by OpenAI + HFI)

Run-fns updated to call helpers with emit directly and emit their own
final 'finish' event. chrome-ai's WebBrowser_ToolCalling drops its
private sanitization + validation copy and reuses the shared utils.

* fix(chrome-ai): wire model.dispose; apply sanitizeToolArgs across providers

Addresses review of #514/#520/#528 rebase:

CRITICAL fix — `model.dispose` now reaches chat-cached sessions. The
post-rebase chrome-ai branch had two parallel session maps
(`chromeSessions` for chat reuse, `sessions` for idle-evict +
ModelDispose lookup) but only the chat map was populated by runtime
code, making `model.dispose` a functional no-op in production.

Unified into a single Map<sessionId, WebBrowserSessionEntry> with both
chat-cache fields (messageCount, fingerprints) and lifecycle fields
(modelKey, lastUsedAt, idleTimer). `ChromeChatSessionState` now requires
`modelKey`. `disposeWebBrowserSessionsForModel(modelKey)` iterates the
unified store, so model.dispose destroys chat-cached sessions. Chat
sessions become subject to idle eviction (free bonus).

IMPORTANT — sanitizeToolArgs applied across the codebase per intent of
the prior refactor:

  - OpenAIShapedChat (parseOpenAIToolCallMessage + accumulateOpenAIStream)
    → covers OpenAI + HFI
  - ToolCallParsers (adaptParserResult + parseToolCallsFromText)
    → covers llama.cpp Hermes/Liquid/Qwen35/Llama paths + HFT
  - Anthropic_ToolCalling (input_json_delta + content_block_stop)
  - Gemini_ToolCalling (functionCall.args)
  - Ollama_ToolCalling (parsed function.arguments)
  - LlamaCpp_ToolCalling (extractNativeFunctionCalls)
  - Cactus_ToolCalling[.browser] (JSON-parse parseToolCalls paths)

Every model-supplied tool-arg payload now passes through
sanitizeToolArgs before reaching downstream consumers, closing the
prototype-pollution vector across the provider matrix.

Also:
  - Added packages/test/src/test/ai/ToolCallingUtils.test.ts (14 unit
    tests for sanitizeToolArgs, compileToolValidators,
    validateToolCallArgs, plus a sanitize→validate→name-check
    integration test).
  - Added WebBrowser_Sessions.test regression for the unified-store
    behavior (disposeWebBrowserSessionsForModel sees chat-cached
    entries).
  - Documented WebBrowser_Chat's rebuild-on-next-turn recovery model
    (vs the in-fn retry that main's now-deleted test exercised).

* feat(chrome-ai): retry once on InvalidStateError when a cached session is destroyed

Chrome can destroy a `LanguageModel` session out from under us (tab
backgrounding, GPU process restart, memory pressure). When a cached
session's `promptStreaming` throws DOMException("...destroyed...",
"InvalidStateError") we now rebuild the session from full history via
`initialPrompts` and retry the prompt once.

Retry is gated on three conditions, all required:
  - We were using a CACHED session (a fresh-session failure means the
    model is broken; retrying won't help).
  - No text-delta has reached the consumer yet (we can't unsend deltas).
  - The error name is `InvalidStateError` (matches Chrome's
    InvalidStateError DOMException; tolerant of message-text changes).

Tests:
  - "retries once with a fresh session when a cached session is destroyed"
    seeds the cache on turn 1, has the cached session's promptStreaming
    throw on turn 2's reuse, asserts rebuild + retry + cache replacement.
  - "does not retry when a fresh (non-cached) session fails" guards the
    first gate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants