fix(controller): credential-extract — IIFE auto-wrap for bare arrow fns + log eval result#42
Draft
caffeinum wants to merge 2 commits into
Draft
fix(controller): credential-extract — IIFE auto-wrap for bare arrow fns + log eval result#42caffeinum wants to merge 2 commits into
caffeinum wants to merge 2 commits into
Conversation
…E + log eval result
The `credential_extract` action passes the LLM-generated JS string to
`page.evaluate()`. Models commonly emit bare arrow/async-function
expressions like `async () => { ... }`, which evaluate as expressions
that just produce a function value and get discarded — the action
returns undefined and the agent thinks the page has no credential
fields.
1. validateAndFixJavaScript now auto-wraps a leading bare
`async () => {...}` / `() => {...}` / `async function () {...}`
expression in `(<expr>)()` before passing to page.evaluate(). The
regex is conservative: only fires on a clear leading pattern with
no trailing call paren, so self-invoking IIFEs and bare statements
pass through unchanged.
2. The evaluate handler logs the rendered return value (truncated to
500 chars) so operators can see what came back when debugging the
"code ran but did nothing" failure mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re arrow/async-fn payloads
Covers the regression behind the credential-extract failure mode where
the LLM emits a bare `async () => { ... }` or `async function () {...}`
expression. Without the wrap, `page.evaluate(<string>)` produces a
function value that is silently discarded.
Tests assert the actual code handed to `page.evaluate` after
`validateAndFixJavaScript` runs:
- bare async arrow → wrapped in (...)()
- bare async function → wrapped in (...)()
- non-function expression (document.title) → passes through unchanged
- already-IIFE (`(async () => 1)()`) → not double-wrapped
Verified to fail on upstream/main src/controller/service.ts and pass on
the fix commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Follow-up notes after adding the test (commit Scope correction: the wrap is const startsAsyncArrow = /^async\s*(?:\([^)]*\)|[A-Za-z_$][\w$]*)\s*=>/.test(trimmed);
const startsAsyncFn = /^async\s+function\b/.test(trimmed);So Cross-link: the "Out of scope" section mentions porting upstream's centralized bbox strip from |
Draft
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
credential_extractaction passes the LLM-generated JS string topage.evaluate(). Models commonly emit bare arrow/async-function expressions:page.evaluate(<string>)evaluates this as an expression — it produces a function value that's immediately discarded. The action returnsundefinedand the agent thinks the page has no credential fields.Fix
async function () {…}expression. If so, wrap with(<expr>)()before passing topage.evaluate(). Falls through for scripts that already self-invoke or are bare statements.Reference: Python upstream
Python upstream relies on prompt-engineering instead — the
evaluateaction description atbrowser_use/tools/service.py:1774literally says "Best practice: wrap in IIFE" and gives an example. In practice the model often forgets, hence the codeside safety net here.Python also uses CDP's
Runtime.evaluaterather than Playwright'spage.evaluate, but both have the same expression-vs-statement semantics — defining an arrow yields a function value that's dropped. So the bug class is identical; upstream's approach is documentation, ours is defense-in-depth. The wrap regex is conservative: it only triggers on a clear leadingasync () =>/() =>/async function (pattern, so legitimate self-invoking IIFEs and non-function scripts pass through unchanged.For evaluate logging, upstream logs
len(result_text)at debug (service.py:1857). This PR additionally logs truncated content at the same level to make production debugging more useful.Out of scope (mentioned for context)
Our internal version of this commit also stripped action-overlay bbox highlights before screenshots in this action. After research, that's the wrong place — upstream solves it centrally in
browser_use/browser/watchdogs/screenshot_watchdog.py:55-62by callingremove_highlights()inside the screenshot pipeline. The right bu-ts fix is to port that centralized pattern, not strip per-action. Happy to follow up with a separate PR if maintainers agree.Test plan
credential_extractwith a bare async-arrow JS payload and verify the extracted values are returneddocument.title) and verify it still works🤖 Generated with Claude Code