Skip to content

test-fixtures: consolidate canonical Swift + Rust fixtures into shared dir#1619

Merged
f-trycua merged 3 commits into
mainfrom
feat/cua-driver-fixtures-canonical-shared-dir
May 21, 2026
Merged

test-fixtures: consolidate canonical Swift + Rust fixtures into shared dir#1619
f-trycua merged 3 commits into
mainfrom
feat/cua-driver-fixtures-canonical-shared-dir

Conversation

@f-trycua
Copy link
Copy Markdown
Collaborator

@f-trycua f-trycua commented May 21, 2026

Summary

  • Move the three canonical Swift HTML test fixtures (interactive.html, form_all_inputs.html, test_page.html) to a new top-level libs/cua-driver-fixtures/ directory — the single source of truth shared by both cua-driver (Swift / macOS) and cua-driver-rs (Rust / Windows + Linux + macOS).
  • Replace each port's duplicate copy with a relative symlink to the canonical, so every existing test (os.path.join(_THIS_DIR, "fixtures", "interactive.html"), f"{html_server}/test_page.html") keeps working unchanged.
  • Add gesture_panels.html — a 140-line companion fixture using the same ID-convention style as test_page.html, covering the four gestures the v2 harness doesn't currently probe: hotkey + modifier-state propagation, pixel-coord pinpoint, drag-and-drop sequence, scroll. Each panel exposes its state via window.getGesturePanelState() for browser_eval readback.

Why

Both ports' integration test trees previously carried duplicate copies of the same HTML fixtures — interactive.html in two places, test_page.html in two places. The duplicate copies are byte-identical today (diff -q shows no differences), but with no link between them they would drift the moment one port iterated on its harness.

gesture_panels.html exists because the May 2026 Windows VM stress test (Notepad++/VS Code/LibreOffice/FreeCAD/Inkscape/Audacity/Krita) showed four gestures need explicit harness probes that test_page.html doesn't cover. Most critical: modifier-state propagation — the page-level #hotkey-status prints ctrl=true|false so SendInput-vs-PostMessage hotkey routing is observable in a single DOM assertion, directly proving #1614's architectural fix.

What this is NOT

  • It's not a refactor of any test code. Every Swift / pytest integration test consumes these files through the same path it always has — only the underlying file is now a symlink.
  • It's not a PR for the new gesture_panels.html driver (no test invokes it yet — that's a follow-up once the Chromium-on-Windows browser-eval harness is wired up; the file is added now so it lives alongside the canonical fixtures from the start).

Result

Drift between Swift and Rust port fixtures is impossible by construction — edits propagate to both ports via the single canonical copy. Net diff: +581 / -1045 lines.

Test plan

  • Existing Swift integration tests still find test_page.html via html_server (no path changes)
  • Existing pytest integration tests in cua-driver-rs still find fixtures/interactive.html via os.path.join
  • On a fresh git clone with core.symlinks=true (default on POSIX), the symlinks resolve to the canonical files (validated: all 5 paths report correct byte counts)
  • gesture_panels.html opens in any browser and the four panels update their status divs on direct interaction
  • Follow-up: wire gesture_panels.html into a Windows-specific cua-driver-rs pytest that drives Edge with --remote-debugging-port for browser_eval readback (separate PR — needs the Chromium harness improvements documented in the README's "Known browser-coverage gaps" section)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved Windows driver's handling of expandable UI elements to support both expand and invoke interaction patterns.
  • Documentation

    • Added fixture documentation describing shared test coverage for gestures and interactions across platforms.
  • Tests

    • Expanded test suite with new fixtures for comprehensive coverage of form inputs, gestures, and interactive components.

Review Change Stack

f-trycua and others added 3 commits May 21, 2026 08:35
…enu items (extends #1611)

When the UIA element under the click point exposes BOTH InvokePattern
AND ExpandCollapsePattern (Qt top-level MenuItems advertise both), the
intended behavior is "open the submenu" — Invoke alone is a no-op for
menu-bar items. Prefer ExpandCollapse.Expand in that case, fall back
to Invoke on failure.

Also relaxes the element filter to accept elements that support EITHER
pattern (was: InvokePattern only). Without this, ExpandCollapse-only
elements (rare but exist, e.g. some tree-view nodes) were skipped
entirely by `try_invoke_in_window_at_point`.

Found while testing FreeCAD on the Windows VM — click on File menu
returned ✅ but the dropdown never appeared because Invoke on Qt
menubar items doesn't expand the submenu.

Not opening as PR per the in-flight overnight-test directive — branch
pushed for backup; user reviews + opens PR when ready.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…res dir

Creates `libs/cua-driver-fixtures/` as the new shared home for HTML test
fixtures used by both Swift cua-driver and Rust cua-driver-rs integration
test suites. Previously each port had its own duplicated copy of
`interactive.html` etc. — the two `driver_client.py` files have drifted
already and the HTML copies will too.

This commit adds the *new* shared fixture only; the deprecation path for
the duplicated fixtures under each port is documented in the README's
Migration plan but not yet executed (would need both ports' integration
tests adjusted to resolve from the shared path).

## What's in gesture-playground.html

A single self-contained 461-line HTML page with embedded JS covering every
cua-driver gesture:

| Panel | Tool tested | State exposed |
|---|---|---|
| 1 click counter | `click(element_index)` | `state.counter` int |
| 2 click types | `click`, `right_click`, `double_click` | `state.multi {type, at}` |
| 3 type_text mirror | `type_text` | `state.text` |
| 4 keyboard / hotkey | `press_key`, `hotkey` with modifier-state check | `state.key {key, code, ctrl, alt, shift, meta}` |
| 5 pixel-coord click | `click(x, y)` with pixel-accuracy distance | `state.coord {x, y}`, `state.coord_dist` |
| 6 drag-and-drop | `drag` (verifies dragstart → dragover → drop) | `state.drag {dropped_from, dropped_at}` |
| 7 scroll | `scroll` | `state.scroll` (px) |
| 8 canvas | mousedown/move/up on HTML5 canvas — proves SendInput-vs-PostMessage delivery to custom-drawn surfaces | `state.canvas {type, x, y}` |
| 9 form-all-inputs | submit handler with every HTML input type (back-compat with existing fixtures) | `state.form` |
| 10 cumulative state dump | reads everything back as JSON for test assertions | — |

Each panel has stable `data-test="<id>"` attributes for targeting and
inner elements have stable `id`s. State is exposed in a single JSON dump
at `#state-dump` so tests can assert end-state in one read.

## Why this matters

This playground specifically exercises gaps that surfaced during the
overnight Windows VM stress test:
- **Modifier-state propagation** (panel 4) verifies SendInput-vs-PostMessage:
  after `hotkey ["ctrl", "s"]`, `state.key.ctrl === true` proves SendInput
  is correctly updating GetKeyState
- **Pixel accuracy** (panel 5) reports distance from a known target point
- **Canvas vs DOM events** (panel 8) catches the universal "PostMessage
  doesn't reach custom-drawn surfaces" pattern seen in Audacity/GIMP/Blender

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ared dir

The previous commit on this branch added a separate `gesture-playground.html`
SPA with its own ID conventions (`#counter-state`, `#type-state`, etc.) — a
parallel harness, not a unified one. That was wrong: both ports already
share three canonical HTML fixtures (`interactive.html`, `form_all_inputs.html`,
`test_page.html`), and their integration tests target specific stable IDs
in those files. The right consolidation is to make those existing fixtures
the single source of truth, not to introduce a fourth fixture.

## What this commit does

1. Moves the three canonical fixtures into `libs/cua-driver-fixtures/`:
   - `interactive.html` (was `cua-driver/Tests/integration/fixtures/`)
   - `form_all_inputs.html` (was same)
   - `test_page.html` (was `cua-driver/Tests/integration/assets/`)

2. Replaces each port's copy with a relative symlink to the canonical:
   ```
   cua-driver/Tests/integration/fixtures/interactive.html       -> ../../../../cua-driver-fixtures/interactive.html
   cua-driver/Tests/integration/fixtures/form_all_inputs.html   -> ../../../../cua-driver-fixtures/form_all_inputs.html
   cua-driver/Tests/integration/assets/test_page.html           -> ../../../../cua-driver-fixtures/test_page.html
   cua-driver-rs/tests/integration/fixtures/interactive.html    -> ../../../../cua-driver-fixtures/interactive.html
   cua-driver-rs/tests/integration/v2/assets/test_page.html     -> ../../../../../cua-driver-fixtures/test_page.html
   ```
   Existing test files keep working unchanged — every `os.path.join(_THIS_DIR,
   "fixtures", "interactive.html")` and `f"{html_server}/test_page.html"`
   resolves through the symlink to the canonical copy.

3. Removes the misguided `gesture-playground.html` from this branch.

4. Adds `gesture_panels.html` — a small (140-line) extension fixture that
   follows the *same* ID-convention style as `test_page.html`, covering four
   gestures the v2 harness doesn't probe and that the May 2026 Windows
   stress test showed needed coverage:
   - **Hotkey + modifier-state propagation** (`#hotkey-status` prints
     `ctrl=true|false` so SendInput-vs-PostMessage routing is observable
     in one assertion — directly the architectural proof for #1614)
   - **Pixel-coord pinpoint accuracy** (`#coord-status` prints distance from
     a known target at `(60,60)`)
   - **Drag-and-drop event sequence** (`#drag-status` records the
     `dragstart → dragover → drop` chain)
   - **Scroll position** (`#scroll-status` prints live `scrollTop`)

   Each panel exposes its state via `window.getGesturePanelState()` for
   `browser_eval`-based readback when Chromium is launched with
   `--remote-debugging-port`.

## Result

Drift between Swift port and Rust port fixtures is impossible by construction
— edits propagate to both ports via the single canonical copy.

Net diff: +581 / -1045 lines (the deleted playground + 4 duplicate fixture
copies vs. one canonical of each).

## Validated

`python3 os.path.isfile` reports all five historic test paths resolve to
the canonical copies with correct byte counts; no test code needs to
change to consume the consolidated layout.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Ignored Ignored May 21, 2026 9:44am

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

This PR consolidates HTML test fixtures into a shared canonical library referenced by both cua-driver (Swift/macOS) and cua-driver-rs (Rust), eliminating duplication via symlinks. It also improves Windows UIA click activation to prefer ExpandCollapse patterns for menu handling.

Changes

Canonical test fixture library consolidation

Layer / File(s) Summary
Fixture documentation and symlink architecture
libs/cua-driver-fixtures/README.md
README establishes canonical fixture concept, documents both ports' symlink-based reference architecture, lists fixtures (interactive.html, test_page.html, form_all_inputs.html, gesture_panels.html), explains gesture coverage rationale, documents environmental coverage gaps, and provides instructions for adding fixtures and propagation model.
Form controls test fixture
libs/cua-driver-fixtures/form_all_inputs.html
Comprehensive form page with all standard input types plus submit handler; handleSubmit captures field values to window._submitted and displays as JSON; getFieldValues() returns current state for test assertions.
Gesture interaction test fixture
libs/cua-driver-fixtures/gesture_panels.html
Four interactive test sections: (1) hotkey/modifier capture with keydown listener, (2) pixel-coordinate click targeting relative to center, (3) drag-and-drop with dragstart/dragover/drop tracking and payload capture, (4) scroll position reporting. Exposes window.getGesturePanelState() to poll accumulated gesture state as JSON.
Interactive and multi-control pages
libs/cua-driver-fixtures/interactive.html, libs/cua-driver-fixtures/test_page.html
interactive.html provides click counter and text input mirror. test_page.html includes button, text input, checkbox, dropdown, textarea, link tracking, and canvas drawing on mousedown with coordinate capture and red dot rendering.
cua-driver symlink consolidation
libs/cua-driver/Tests/integration/fixtures/form_all_inputs.html, libs/cua-driver/Tests/integration/fixtures/interactive.html
Swift driver fixtures converted to relative symlinks pointing to canonical fixture library; removed duplicate test_page.html from cua-driver assets.

Windows UIA pattern matching improvements

Layer / File(s) Summary
UIA pattern eligibility and fallback activation
libs/cua-driver-rs/crates/platform-windows/src/uia/windows_enum.rs
Updated try_invoke_in_window_at_point hit-testing: elements now considered actionable if supporting InvokePattern OR ExpandCollapsePattern (previously Invoke only). Activation logic prefers ExpandCollapse.Expand when both patterns available, with fallback to ExpandCollapse-only, handling menu items where Invoke is a no-op.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • trycua/cua#1549: Both PRs update Windows UI Automation invocation logic in windows_enum.rs, with this PR expanding pattern matching to prefer ExpandCollapse while the related PR improves Invoke handling and adds try_invoke_at_point.
  • trycua/cua#1375: The form_all_inputs.html fixture introduced in this PR (with handleSubmit/window._submitted and getFieldValues()) underpins form-fill integration tests in the related PR.

🐰 A fixture tree grows with symlinks so true,
One source of gesture, form, and click tests too,
ExpandCollapse now dances with Invoke in sight,
No more duplicate pages to maintain—pure delight!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: consolidating test fixtures into a shared directory for both Swift and Rust driver ports.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cua-driver-fixtures-canonical-shared-dir

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
libs/cua-driver-rs/crates/platform-windows/src/uia/windows_enum.rs (2)

223-235: 💤 Low value

Consider importing UIA_ExpandCollapsePatternId for consistency.

The code uses the fully-qualified path for UIA_ExpandCollapsePatternId (lines 231, 264, 271, 284) while similar identifiers like UIA_InvokePatternId are imported at line 26. Same applies to IUIAutomationExpandCollapsePattern (lines 274, 287) vs the imported IUIAutomationInvokePattern.

Suggested import additions
 use windows::Win32::UI::Accessibility::{
-    CUIAutomation, IUIAutomation, IUIAutomationElement, IUIAutomationInvokePattern,
-    IUIAutomationTogglePattern, TreeScope_Children, TreeScope_Subtree,
+    CUIAutomation, IUIAutomation, IUIAutomationElement, IUIAutomationExpandCollapsePattern,
+    IUIAutomationInvokePattern, IUIAutomationTogglePattern, TreeScope_Children, TreeScope_Subtree,
     UIA_AcceleratorKeyPropertyId, UIA_InvokePatternId, UIA_PROPERTY_ID,
-    UIA_TogglePatternId,
+    UIA_ExpandCollapsePatternId, UIA_TogglePatternId,
 };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/cua-driver-rs/crates/platform-windows/src/uia/windows_enum.rs` around
lines 223 - 235, Import UIA_ExpandCollapsePatternId and
IUIAutomationExpandCollapsePattern alongside the existing
UIA_InvokePatternId/IUIAutomationInvokePattern imports and replace the
fully-qualified usages
(windows::Win32::UI::Accessibility::UIA_ExpandCollapsePatternId and
windows::Win32::UI::Accessibility::IUIAutomationExpandCollapsePattern) with the
short names in windows_enum.rs (e.g., where has_expand is computed and where the
expand/collapse pattern is referenced) so the code is consistent and easier to
read.

139-178: 💤 Low value

Update docstring to reflect ExpandCollapsePattern support.

The docstring still describes only InvokePattern support (lines 140-142, 169, 176-177), but the implementation now also handles ExpandCollapsePattern. Update the documentation to accurately describe the expanded behavior.

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/cua-driver-fixtures/gesture_panels.html`:
- Around line 98-105: The dragEvents array is never cleared so subsequent drags
accumulate previous events; inside the 'dragstart' event listener registered on
src (the element retrieved with getElementById('drag-source')), reset dragEvents
(e.g., set dragEvents = [] or dragEvents.length = 0) at the start of the handler
before pushing 'dragstart' and updating the drag-status text, ensuring each drag
interaction begins with a fresh history.

In `@libs/cua-driver-fixtures/README.md`:
- Around line 30-43: The fenced code block in libs/cua-driver-fixtures/README.md
(the tree diagram showing libs/cua-driver/Tests/integration and
libs/cua-driver-rs/tests/integration) is missing a language tag; update the
opening fence from ``` to ```text so the block is recognized as plain text
(addressing MD040) and keep the block content unchanged.

---

Nitpick comments:
In `@libs/cua-driver-rs/crates/platform-windows/src/uia/windows_enum.rs`:
- Around line 223-235: Import UIA_ExpandCollapsePatternId and
IUIAutomationExpandCollapsePattern alongside the existing
UIA_InvokePatternId/IUIAutomationInvokePattern imports and replace the
fully-qualified usages
(windows::Win32::UI::Accessibility::UIA_ExpandCollapsePatternId and
windows::Win32::UI::Accessibility::IUIAutomationExpandCollapsePattern) with the
short names in windows_enum.rs (e.g., where has_expand is computed and where the
expand/collapse pattern is referenced) so the code is consistent and easier to
read.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6203bbf2-7009-43c8-bf1a-97a184893226

📥 Commits

Reviewing files that changed from the base of the PR and between 5e9afd6 and e72cb6b.

📒 Files selected for processing (16)
  • libs/cua-driver-fixtures/README.md
  • libs/cua-driver-fixtures/form_all_inputs.html
  • libs/cua-driver-fixtures/gesture_panels.html
  • libs/cua-driver-fixtures/interactive.html
  • libs/cua-driver-fixtures/test_page.html
  • libs/cua-driver-rs/crates/platform-windows/src/uia/windows_enum.rs
  • libs/cua-driver-rs/tests/integration/fixtures/interactive.html
  • libs/cua-driver-rs/tests/integration/fixtures/interactive.html
  • libs/cua-driver-rs/tests/integration/v2/assets/test_page.html
  • libs/cua-driver-rs/tests/integration/v2/assets/test_page.html
  • libs/cua-driver/Tests/integration/assets/test_page.html
  • libs/cua-driver/Tests/integration/assets/test_page.html
  • libs/cua-driver/Tests/integration/fixtures/form_all_inputs.html
  • libs/cua-driver/Tests/integration/fixtures/form_all_inputs.html
  • libs/cua-driver/Tests/integration/fixtures/interactive.html
  • libs/cua-driver/Tests/integration/fixtures/interactive.html

Comment on lines +98 to +105
var dragEvents = [];
var src = document.getElementById('drag-source');
var tgt = document.getElementById('drag-target');
src.addEventListener('dragstart', function(e) {
dragEvents.push('dragstart');
e.dataTransfer.setData('text/plain', 'DRAG ME');
document.getElementById('drag-status').textContent = 'drag: ' + dragEvents.join(' → ');
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reset drag event history at the start of each drag interaction.

dragEvents is never cleared, so a second drag includes events from earlier runs and can make test assertions flaky.

Suggested fix
 src.addEventListener('dragstart', function(e) {
-  dragEvents.push('dragstart');
+  dragEvents = ['dragstart'];
   e.dataTransfer.setData('text/plain', 'DRAG ME');
   document.getElementById('drag-status').textContent = 'drag: ' + dragEvents.join(' → ');
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var dragEvents = [];
var src = document.getElementById('drag-source');
var tgt = document.getElementById('drag-target');
src.addEventListener('dragstart', function(e) {
dragEvents.push('dragstart');
e.dataTransfer.setData('text/plain', 'DRAG ME');
document.getElementById('drag-status').textContent = 'drag: ' + dragEvents.join(' → ');
});
var dragEvents = [];
var src = document.getElementById('drag-source');
var tgt = document.getElementById('drag-target');
src.addEventListener('dragstart', function(e) {
dragEvents = ['dragstart'];
e.dataTransfer.setData('text/plain', 'DRAG ME');
document.getElementById('drag-status').textContent = 'drag: ' + dragEvents.join(' → ');
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/cua-driver-fixtures/gesture_panels.html` around lines 98 - 105, The
dragEvents array is never cleared so subsequent drags accumulate previous
events; inside the 'dragstart' event listener registered on src (the element
retrieved with getElementById('drag-source')), reset dragEvents (e.g., set
dragEvents = [] or dragEvents.length = 0) at the start of the handler before
pushing 'dragstart' and updating the drag-status text, ensuring each drag
interaction begins with a fresh history.

Comment on lines +30 to +43
```
libs/cua-driver/Tests/integration/
├── fixtures/
│ ├── interactive.html → ../../../../cua-driver-fixtures/interactive.html
│ └── form_all_inputs.html → ../../../../cua-driver-fixtures/form_all_inputs.html
└── assets/
└── test_page.html → ../../../../cua-driver-fixtures/test_page.html

libs/cua-driver-rs/tests/integration/
├── fixtures/
│ └── interactive.html → ../../../../cua-driver-fixtures/interactive.html
└── v2/assets/
└── test_page.html → ../../../../../cua-driver-fixtures/test_page.html
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced tree block.

The code fence starting at Line 30 is missing a language identifier (MD040), which will keep markdown lint noisy.

Suggested fix
-```
+```text
 libs/cua-driver/Tests/integration/
 ├── fixtures/
 │   ├── interactive.html      → ../../../../cua-driver-fixtures/interactive.html
 │   └── form_all_inputs.html  → ../../../../cua-driver-fixtures/form_all_inputs.html
 └── assets/
     └── test_page.html        → ../../../../cua-driver-fixtures/test_page.html
 
 libs/cua-driver-rs/tests/integration/
 ├── fixtures/
 │   └── interactive.html      → ../../../../cua-driver-fixtures/interactive.html
 └── v2/assets/
     └── test_page.html        → ../../../../../cua-driver-fixtures/test_page.html
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 30-30: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/cua-driver-fixtures/README.md` around lines 30 - 43, The fenced code
block in libs/cua-driver-fixtures/README.md (the tree diagram showing
libs/cua-driver/Tests/integration and libs/cua-driver-rs/tests/integration) is
missing a language tag; update the opening fence from ``` to ```text so the
block is recognized as plain text (addressing MD040) and keep the block content
unchanged.

@f-trycua f-trycua merged commit 707d143 into main May 21, 2026
7 checks passed
@f-trycua f-trycua deleted the feat/cua-driver-fixtures-canonical-shared-dir branch May 21, 2026 12:06
f-trycua added a commit that referenced this pull request May 21, 2026
…ling flags in launch_app

`launch_app` uses `SW_SHOWNOACTIVATE` so launched windows don't steal
focus. For Chromium-based browsers (Edge / Chrome / Brave / Vivaldi /
Opera / Chromium / Arc / Thorium / Iridium / Yandex) this triggers
occlusion-based renderer throttling: the renderer process is suspended
for the *entire* tab lifetime, the UIA tree exposes only browser chrome,
and `PrintWindow` returns a blank body. Downstream tools
(`get_window_state`, `screenshot`, `click`, `type_text`) all fail
silently against the page content.

## Fix

When `launch_app` resolves a target naming a Chromium-based browser,
auto-prepend three flags to `additional_arguments`:

```
--disable-features=CalculateNativeWinOcclusion   ← root cause
--disable-backgrounding-occluded-windows         ← backstop 1 (process priority)
--disable-renderer-backgrounding                  ← backstop 2 (renderer throttle)
```

`CalculateNativeWinOcclusion` is the root cause; the two
`--disable-backgrounding-*` flags backstop the same effect through the
process-priority and renderer-throttling layers because Chromium
suspends renderers on multiple signals. Injecting all three matches the
flag set documented at Chromium's `chrome://flags`.

Two helpers, both pure logic:

- **`is_chromium_browser_target(target)`** — matches the executable
  basename (case-insensitive, with/without `.exe`) against the known
  Chromium browser names. Handles bare names (`"msedge"`), full paths
  (`r"C:\...\msedge.exe"`), forward-slash paths, and round-tripped
  launch paths with trailing arguments (`r#""C:\...\chrome.exe"
  --profile-directory=..."#`). Uses `split_launchable_target` to peel
  args off launch_path-style targets.

- **`inject_chromium_anti_throttling_flags(extra_args)`** — prepends the
  three flags. Idempotent: if `--disable-features=` already exists in
  the caller's args, merges `CalculateNativeWinOcclusion` into it
  (Chromium has subtle merging rules across duplicate `--disable-features`
  entries — collapsing into one entry avoids ambiguity). The boolean
  flags are only inserted when absent.

## Where the injection runs

After target resolution in `LaunchAppTool::run`, gated on the target
having been resolved from `launch_path` / `path` / `name` (i.e. the
ShellExecuteExW path). UWP/AUMID routing is skipped because the
packaged Edge channel routes differently and the modern Edge ships as a
desktop install that hits the ShellExecuteExW path here.

## Tests

8 new unit tests under `chromium_flag_injection_tests`:
- `detects_bare_browser_names` — all 10 known names match (case-insensitive)
- `detects_full_paths` — both `C:\...` and `C:/...` separators
- `detects_launch_path_with_trailing_args` — `"<exe>" <args>` round-trip
- `does_not_match_non_chromium_apps` — firefox, notepad, explorer, code, soffice
- `injects_three_flags_into_empty_args` — base case
- `merges_into_existing_disable_features_list` — `--disable-features=Foo,Bar`
  + injection = single `--disable-features=Foo,Bar,CalculateNativeWinOcclusion`
- `idempotent_when_all_flags_already_present` — second call is a no-op
- `preserves_user_url_argument_after_flags` — URL stays in args after injection

All 8 pass on the VM (13.77s test compile, 0.00s test execution).

## E2E verification (against #1619's canonical `test_page.html`)

```
launch_app(path='msedge', additional_arguments=['file:///C:/...test_page.html'])
  → pid 6708, returned without page DOM
get_window_state (after Chromium lazy-builds the tree on first AT probe)
  → 33 elements, includes:
     Document "CUA Driver Test Page v2"
     Button "Click Me" id=clicker actions=[invoke]
screenshot
  → fully painted page (Button + Text Input + Checkbox + Dropdown all visible),
    not a blanked body
```

Edge launched non-foreground via `SW_SHOWNOACTIVATE`; renderer was NOT
occlusion-throttled; DOM constructed, exposed via UIA, and painted —
exactly the regression-prevention case the fix targets.

Closes #1620.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
f-trycua added a commit that referenced this pull request May 21, 2026
…ling flags in launch_app (#1624)

`launch_app` uses `SW_SHOWNOACTIVATE` so launched windows don't steal
focus. For Chromium-based browsers (Edge / Chrome / Brave / Vivaldi /
Opera / Chromium / Arc / Thorium / Iridium / Yandex) this triggers
occlusion-based renderer throttling: the renderer process is suspended
for the *entire* tab lifetime, the UIA tree exposes only browser chrome,
and `PrintWindow` returns a blank body. Downstream tools
(`get_window_state`, `screenshot`, `click`, `type_text`) all fail
silently against the page content.

## Fix

When `launch_app` resolves a target naming a Chromium-based browser,
auto-prepend three flags to `additional_arguments`:

```
--disable-features=CalculateNativeWinOcclusion   ← root cause
--disable-backgrounding-occluded-windows         ← backstop 1 (process priority)
--disable-renderer-backgrounding                  ← backstop 2 (renderer throttle)
```

`CalculateNativeWinOcclusion` is the root cause; the two
`--disable-backgrounding-*` flags backstop the same effect through the
process-priority and renderer-throttling layers because Chromium
suspends renderers on multiple signals. Injecting all three matches the
flag set documented at Chromium's `chrome://flags`.

Two helpers, both pure logic:

- **`is_chromium_browser_target(target)`** — matches the executable
  basename (case-insensitive, with/without `.exe`) against the known
  Chromium browser names. Handles bare names (`"msedge"`), full paths
  (`r"C:\...\msedge.exe"`), forward-slash paths, and round-tripped
  launch paths with trailing arguments (`r#""C:\...\chrome.exe"
  --profile-directory=..."#`). Uses `split_launchable_target` to peel
  args off launch_path-style targets.

- **`inject_chromium_anti_throttling_flags(extra_args)`** — prepends the
  three flags. Idempotent: if `--disable-features=` already exists in
  the caller's args, merges `CalculateNativeWinOcclusion` into it
  (Chromium has subtle merging rules across duplicate `--disable-features`
  entries — collapsing into one entry avoids ambiguity). The boolean
  flags are only inserted when absent.

## Where the injection runs

After target resolution in `LaunchAppTool::run`, gated on the target
having been resolved from `launch_path` / `path` / `name` (i.e. the
ShellExecuteExW path). UWP/AUMID routing is skipped because the
packaged Edge channel routes differently and the modern Edge ships as a
desktop install that hits the ShellExecuteExW path here.

## Tests

8 new unit tests under `chromium_flag_injection_tests`:
- `detects_bare_browser_names` — all 10 known names match (case-insensitive)
- `detects_full_paths` — both `C:\...` and `C:/...` separators
- `detects_launch_path_with_trailing_args` — `"<exe>" <args>` round-trip
- `does_not_match_non_chromium_apps` — firefox, notepad, explorer, code, soffice
- `injects_three_flags_into_empty_args` — base case
- `merges_into_existing_disable_features_list` — `--disable-features=Foo,Bar`
  + injection = single `--disable-features=Foo,Bar,CalculateNativeWinOcclusion`
- `idempotent_when_all_flags_already_present` — second call is a no-op
- `preserves_user_url_argument_after_flags` — URL stays in args after injection

All 8 pass on the VM (13.77s test compile, 0.00s test execution).

## E2E verification (against #1619's canonical `test_page.html`)

```
launch_app(path='msedge', additional_arguments=['file:///C:/...test_page.html'])
  → pid 6708, returned without page DOM
get_window_state (after Chromium lazy-builds the tree on first AT probe)
  → 33 elements, includes:
     Document "CUA Driver Test Page v2"
     Button "Click Me" id=clicker actions=[invoke]
screenshot
  → fully painted page (Button + Text Input + Checkbox + Dropdown all visible),
    not a blanked body
```

Edge launched non-foreground via `SW_SHOWNOACTIVATE`; renderer was NOT
occlusion-throttled; DOM constructed, exposed via UIA, and painted —
exactly the regression-prevention case the fix targets.

Closes #1620.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
f-trycua added a commit that referenced this pull request May 21, 2026
…nstall.ps1 PS 5.1 workaround (#1627)

* docs(cua-driver): Windows behavior notes for the v0.2.9 fix chain + install.ps1 PS 5.1 workaround

Tonight's three cua-driver-rs Windows fixes (#1620 Chromium anti-throttling
flag auto-inject in `launch_app`, #1621 control-type whitelist for the
`click(x, y)` UIA Invoke pre-check, #1623 SendInput routing for Chromium
coord clicks) shipped in v0.2.9 without docs updates. This PR closes that
gap and documents the install.ps1 PS 5.1 parse bug as a known issue.

## mcp-tools.mdx

- New top-level section `## Windows behavior notes` at the end of the
  reference, gathering the three cross-cutting changes:
  - `launch_app` Chromium flag list + the 10 detected browser executables
  - `click(x, y)` control-type whitelist (Button / MenuItem / Hyperlink /
    TabItem / ListItem / CheckBox / RadioButton / SplitButton / TreeItem) +
    why canvases / Panes / Customs fall through
  - SendInput on Chromium with brief foreground swap + cursor jump, the
    UIAccess requirement, and the `cua-driver-uia.exe` proxy default
  - `hotkey`'s SendInput-routed delivery + matching UIAccess constraint
- Inline cross-references from `click`, `launch_app`, and `hotkey`
  pointing to the Windows behavior section so callers reading any of
  those tool entries see the platform-specific notes.

## installation.mdx

- Callout under the Windows install one-liner documenting #1626 (PS 5.1
  parse error on `install.ps1`) with the manual-zip workaround verbatim
  from the issue, scoped to PS 5.1 only (PS 7+ parses fine).

Closes the standing /docs update obligation for #1619, #1620, #1621, #1623.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(cua-driver): make manual-zip PATH update idempotent (CodeRabbit on #1627)

Re-running the manual-install workaround duplicated `$dest` in the User
PATH because the snippet unconditionally prepended. Guards with a
`-notcontains` check before `SetEnvironmentVariable` so the entry is
added at most once.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant