-
Notifications
You must be signed in to change notification settings - Fork 12
Compact vs Full ACT Prompt β Analysis
- Full ACT prompt: ~33,000 chars (~8k tokens)
- Compact ACT prompt: ~3,400 chars (~850 tokens)
- Compact is ~10% the size of full β a 7k+ token savings on every turn
The compact column β everything the model sees end-to-end. The full column lists section names only.
| Topic | Compact | Full |
|---|---|---|
| Opening | One line ("You are WebBrain, an AI browser agent. You interact with web pages through tools.") | Full paragraph about operating env, browser session, "you ARE the user from the website's POV" |
| OPERATING ENVIRONMENT (refusals, scheduling, auth) | Rule #1 only: "Never refuse⦠Just do it through the UI" | 6 detailed bullets including "You CANNOT schedule, sleep, set timers", refusal patterns to avoid, legitimate decline cases |
| Tool list | Tools NEVER listed by name. Mentions inline: read_page, click, type_text, scratchpad_write, done, download_social_media, get_accessibility_tree, download_file, upload_file β that's it |
Full tool list with one-line descriptions for ~25 tools including click_ax, type_ax, set_field, hover, drag_drop, wait_for_element, wait_for_stable, verify_form, record_tab, clarify, iframe_*, etc. |
| ACCESSIBILITY TREE (the preferred page-read path) |
Section absent. Never mentioned. No click_ax/type_ax/set_field/ref_id guidance. |
30+ lines: format example, parameters, ref_id stability rules, "DEFAULT ACT LOOP" step-by-step, fallback to legacy click()
|
| Current Page Priority | Rule #2: one line | Full section: "user is on this page for a reason", when to navigate away |
| Verify after action | Rule #3: one line | "CRITICAL β do NOT rush" section (4 bullets: don't chain calls, verify creation success, fill one field at a time, parse request first) |
| TYPING patterns | One line at the bottom: "Click field, then type_text with NO selector" |
Full TYPING section with branches for: text input vs <select> vs ARIA combobox / Stripe-style picker, type-ahead, ArrowDown+Enter for portaled listboxes, rich-text editors (CodeMirror, Gmail compose), hard rule "no click_ax loops" |
| CLICKING preference order | One line: "text > index > selector > x,y" | Full CLICKING section with the order PLUS rules about jQuery pseudo-classes, ambiguity errors, data-testid guessing, coord interpretation on HiDPI displays |
| Index instability | Last line in compact, one sentence | Full INDEX INSTABILITY section: "indices are NOT stable", per-turn invalidation, "don't guess from training data" |
| MODALS & DIALOGS | Section absent. | 5 bullets: modal-scoped queries return "no match" for outer page, finish modal first, post-submit verify dialog closed, occlusion-blocked error pattern |
FORMS / verify_form |
Section absent. | 6 bullets: call verify_form before submit, check creation timestamp matches "now", CAPTCHA handling |
| IFRAMES (Stripe, embedded widgets) | Section absent. | 6 bullets: iframe_read/click/type tools, urlFilter param, cross-origin works because extension bypass |
| UI vs API | Rule #10: one line "Interact through the visible UI" | Full section: rationale, two exceptions (user override, /allow-api flag), concrete examples per site |
| SCROLLING | Section absent. | 4 bullets: scroll for below-fold buttons, check all fields before submit |
| SCRATCHPAD usage patterns | Rule #11: one sentence | Full section: 5 use cases, "after download_file IMMEDIATELY scratchpad the path" |
| DON'T REDO WORK | Rule #12: ~3 lines | Full section: download-already-done loop, fetch already returned, ref_ids stable across calls, the "doubt β re-navigate" antipattern |
| LISTINGS & PAGINATION | Short version (~3 lines) | Detailed version (~6 lines with ?page=/?sd= URL patterns, maxChars overflow, terminal-list tasks) |
| Scheduling / wait-for-event | Rule #13: 2 lines | Long section in operating-env explaining no cron, no timers |
| Social media downloads | Rule #14: one line | Full SOCIAL MEDIA DOWNLOADS section with download_social_media mode/scroll options, recommendation field handling |
In rough order of how often it bites real tasks:
-
The whole AX-tree path. No mention of
get_accessibility_tree,click_ax,type_ax,set_field, orref_id. A model on compact has no idea these tools exist from the prompt. (They exist in the tool list β see "Tool access" below β but the model has no guidance about preferring them.) - Modal handling. No occlusion / modal-scoped click guidance.
- iframes. No Stripe / payment-widget tooling.
-
Form verify. No
verify_formmention. - Combobox handling β the Stripe currency picker / portaled-listbox sequence.
-
CAPTCHA +
solve_captcha. -
All the "new in this PR" tools β
hover,drag_drop,wait_for_stable.
Yes β same tools, no filtering. Looking at getToolsForMode(mode, opts):
export function getToolsForMode(mode, opts = {}) {
const base = (mode === 'ask')
? AGENT_TOOLS.filter(t => ASK_ONLY_TOOLS.includes(t.function.name))
: AGENT_TOOLS;
if (!opts.strictSecretMode) return base;
return base.map(t => (t.function.name === 'done' ? DONE_TOOL_STRICT : t));
}Only branches on ask vs everything-else, and strictSecretMode. The compact prompt is not even an input to tool filtering. So useCompactPrompt: true β same 44 tool schemas sent to the model.
But. Tool schemas alone don't teach the model when to use which tool. The full ACT prompt has 200+ lines of "prefer X over Y in case Z" guidance. The compact prompt has none of that. So in practice, a small model on compact mode:
-
Can call
click_ax({ref_id: "ref_42"})β the schema is there, the description is there -
Won't think to β because nothing in the prompt tells it
click_axexists or that ref_ids come fromget_accessibility_tree - Will default to the patterns the prompt does mention:
click({text: "..."}),type_text,screenshot,done
This is actually the deeper reason compact is risky and why the default was flipped. The agent loses access to the AX-tree-driven workflow β the single biggest reliability win on modern React/portal-heavy sites β without any explicit filtering. The model just doesn't know to use those tools.
Summary: same tools, missing guidance, materially weaker behavior. Not a "feature toggle" so much as a "drop the playbook" toggle.