Skip to content

fix(controller): credential-extract — IIFE auto-wrap for bare arrow fns + log eval result#42

Draft
caffeinum wants to merge 2 commits into
webllm:mainfrom
caffeinum:fix/credential-extract-iife-wrap
Draft

fix(controller): credential-extract — IIFE auto-wrap for bare arrow fns + log eval result#42
caffeinum wants to merge 2 commits into
webllm:mainfrom
caffeinum:fix/credential-extract-iife-wrap

Conversation

@caffeinum
Copy link
Copy Markdown
Contributor

Summary

The credential_extract action passes the LLM-generated JS string to page.evaluate(). Models commonly emit bare arrow/async-function expressions:

async () => {
  const u = document.querySelector('input[type=email]')?.value;
  return { username: u };
}

page.evaluate(<string>) evaluates this as an expression — it produces a function value that's immediately discarded. The action returns undefined and the agent thinks the page has no credential fields.

Fix

  1. IIFE auto-wrap: detect if the script string parses as a leading arrow-function or async function () {…} expression. If so, wrap with (<expr>)() before passing to page.evaluate(). Falls through for scripts that already self-invoke or are bare statements.
  2. Log the evaluate return value (truncated) so we can see what came back when debugging.

Reference: Python upstream

Python upstream relies on prompt-engineering instead — the evaluate action description at browser_use/tools/service.py:1774 literally says "Best practice: wrap in IIFE" and gives an example. In practice the model often forgets, hence the codeside safety net here.

Python also uses CDP's Runtime.evaluate rather than Playwright's page.evaluate, but both have the same expression-vs-statement semantics — defining an arrow yields a function value that's dropped. So the bug class is identical; upstream's approach is documentation, ours is defense-in-depth. The wrap regex is conservative: it only triggers on a clear leading async () => / () => / async function ( pattern, so legitimate self-invoking IIFEs and non-function scripts pass through unchanged.

For evaluate logging, upstream logs len(result_text) at debug (service.py:1857). This PR additionally logs truncated content at the same level to make production debugging more useful.

Out of scope (mentioned for context)

Our internal version of this commit also stripped action-overlay bbox highlights before screenshots in this action. After research, that's the wrong place — upstream solves it centrally in browser_use/browser/watchdogs/screenshot_watchdog.py:55-62 by calling remove_highlights() inside the screenshot pipeline. The right bu-ts fix is to port that centralized pattern, not strip per-action. Happy to follow up with a separate PR if maintainers agree.

Test plan

  • Manually trigger credential_extract with a bare async-arrow JS payload and verify the extracted values are returned
  • Trigger with an already-self-invoking IIFE and verify no double-wrapping
  • Trigger with a bare statement (document.title) and verify it still works
  • Check log output contains a truncated form of the evaluate return

🤖 Generated with Claude Code

caffeinum and others added 2 commits May 19, 2026 12:27
…E + log eval result

The `credential_extract` action passes the LLM-generated JS string to
`page.evaluate()`. Models commonly emit bare arrow/async-function
expressions like `async () => { ... }`, which evaluate as expressions
that just produce a function value and get discarded — the action
returns undefined and the agent thinks the page has no credential
fields.

1. validateAndFixJavaScript now auto-wraps a leading bare
   `async () => {...}` / `() => {...}` / `async function () {...}`
   expression in `(<expr>)()` before passing to page.evaluate(). The
   regex is conservative: only fires on a clear leading pattern with
   no trailing call paren, so self-invoking IIFEs and bare statements
   pass through unchanged.

2. The evaluate handler logs the rendered return value (truncated to
   500 chars) so operators can see what came back when debugging the
   "code ran but did nothing" failure mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re arrow/async-fn payloads

Covers the regression behind the credential-extract failure mode where
the LLM emits a bare `async () => { ... }` or `async function () {...}`
expression. Without the wrap, `page.evaluate(<string>)` produces a
function value that is silently discarded.

Tests assert the actual code handed to `page.evaluate` after
`validateAndFixJavaScript` runs:
- bare async arrow → wrapped in (...)()
- bare async function → wrapped in (...)()
- non-function expression (document.title) → passes through unchanged
- already-IIFE (`(async () => 1)()`) → not double-wrapped

Verified to fail on upstream/main src/controller/service.ts and pass on
the fix commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@caffeinum
Copy link
Copy Markdown
Contributor Author

Follow-up notes after adding the test (commit 64423dc):

Scope correction: the wrap is async-only, not "any arrow function" as the PR body implies. The actual regex (at src/controller/service.ts:299-302) is:

const startsAsyncArrow = /^async\s*(?:\([^)]*\)|[A-Za-z_$][\w$]*)\s*=>/.test(trimmed);
const startsAsyncFn    = /^async\s+function\b/.test(trimmed);

So async () => {…} and async function () {…} get wrapped; () => "x" (sync arrow) does not. In production this is fine — the LLM almost always emits async for this action — but worth flagging as a known scope limit rather than a complete fix.

Cross-link: the "Out of scope" section mentions porting upstream's centralized bbox strip from screenshot_watchdog.py. That follow-up is now #43.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant