Skip to content

fix(cua-driver-rs/windows): SendInput hotkey + launcher-stub pid chain (closes #1614, #1615)#1618

Merged
f-trycua merged 3 commits into
mainfrom
sprint/cua-driver-rs-windows-sendinput-hotkey
May 21, 2026
Merged

fix(cua-driver-rs/windows): SendInput hotkey + launcher-stub pid chain (closes #1614, #1615)#1618
f-trycua merged 3 commits into
mainfrom
sprint/cua-driver-rs-windows-sendinput-hotkey

Conversation

@f-trycua
Copy link
Copy Markdown
Collaborator

@f-trycua f-trycua commented May 21, 2026

Summary

Three related Windows-platform fixes caught during the overnight stress test + this morning's sprint. All verified end-to-end via Claude Code over SSH against the dev VM. See WINDOWS_VM_STRESS_TEST_FINDINGS.md (local, not committed) for the full diagnosis trail.

Commits

1. fix: SendInput-based hotkey for non-XAML Win32 targets (closes #1614)

PostMessage(WM_KEYDOWN, VK_CONTROL) doesn't set the system-wide modifier state apps poll via GetKeyState. TranslateAccelerator-based apps (most native Win32 apps: LibreOffice, Notepad++, FAR, classic Notepad) see the keystroke arrive with no Ctrl held → route through text input instead of firing the shortcut. Universal silent-no-op.

Fix: new send_key_synthesized in input/keyboard.rs builds an INPUT[] sequence (modifiers-down, key-down, key-up, modifiers-up), briefly swaps foreground to the target via SetForegroundWindow so SendInput lands there, saves+restores previous foreground. HotkeyTool::invoke routes through this when modifiers are present; plain keys keep PostMessage (no focus theft needed).

Verified:

  • Notepad++ Ctrl+S → ✅ Save As dialog opens (was silent no-op)
  • LibreOffice Writer Ctrl+A → Ctrl+B → Ctrl+S → ✅ all fire as accelerators (was inserting literal abs as text)
  • End-to-end via Claude Code over SSH: marker string via SendInput (Win32 target) confirmed

2. fix: launch_app launcher-stub fallback (closes #1615)

Wrapper EXEs that re-exec and exit (LibreOffice swriter.exesoffice.bin, GIMP gimp-3.exegimp-3.2.exe) leave launch_app returning a pid with windows: []. Downstream tools can't be exercised.

Fix: new related_processes(root_pid, exe_basename) walks the process tree (BFS via parent_pid from CreateToolhelp32Snapshot) AND name-prefix matches (strips .exe + trailing version digits). LaunchAppTool::invoke falls back to scanning these candidates when the primary window-resolution loop yields nothing; first candidate with a window wins; resolved pid becomes the response's pid.

Verified:

  • swriter.exe → resolved to soffice.bin pid + Writer window (was returning swriter stub pid + empty windows)
  • End-to-end via Claude Code: caller's downstream calls (list_windows, type_text, hotkey, click) all worked against the resolved pid

3. fix: extend launch_app descendant-scan budget for known-slow launchers

Initial fallback retry was 3×200ms per candidate — fine for LO but too short for cold-start apps (GIMP, Blender, Inkscape, Krita, FreeCAD).

Fix: detect known-slow launchers by exe-basename prefix, extend retry to 30 × 200ms per candidate + re-scan loop with 500ms sleep between scans, total capped at ~12s. Fast launchers unchanged.

Verified: mechanism is sound. The GIMP-specific case on the dev VM doesn't resolve because gimp-3.exe never opens a window on this VM (same Calculator-style "process up but no UI" env issue we documented earlier) — not a code bug.

Diff stats

  • crates/platform-windows/src/input/keyboard.rs — +110 (SendInput + INPUT-builder)
  • crates/platform-windows/src/input/mod.rs — +1 export
  • crates/platform-windows/src/tools/impl_.rs — +120 (HotkeyTool routing + launcher-chain fallback + slow-launcher budget)
  • crates/platform-windows/src/win32/apps.rs — +75 (list_descendants, related_processes, strip_version_suffix)
  • crates/platform-windows/src/win32/mod.rs — +1 export

3 commits, +307/-30 net. No new dependencies.

What this PR does NOT fix (followups)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Enhanced Windows keyboard input with intelligent method selection for improved modifier key support and accelerator compatibility.
    • Improved app launching detection for applications using launcher stub processes.

Review Change Stack

f-trycua and others added 3 commits May 21, 2026 06:18
… targets (closes #1614)

PostMessage(WM_KEYDOWN, VK_CONTROL) doesn't set the system-wide modifier
state apps poll via GetKeyState. TranslateAccelerator-based apps (most
native Win32 apps: LibreOffice, Notepad++, FAR, classic Notepad) see
the keystroke arrive with no Ctrl held → routes it through text input
instead of firing the shortcut. The driver returns success because the
message was posted; nothing happens because the accelerator never matches.

This was a universal Win32 silent-no-op affecting every Ctrl+X / Shift+X
hotkey against any non-XAML target. Found during overnight stress test
on Notepad++ 8.9.5 + LibreOffice Writer 26.2.3.2.

## Fix

New `send_key_synthesized(hwnd, key, modifiers)` in `input/keyboard.rs`:
- Builds a SendInput sequence: modifiers-down, key-down, key-up,
  modifiers-up (reverse order). Uses scancodes + EXTENDEDKEY flag so the
  target sees a hardware-like keystroke.
- Briefly swaps foreground to the target via `SetForegroundWindow` so the
  synthesized input lands there. Saves+restores the previous foreground.
- Returns an actionable error if SendInput inserts fewer events than sent
  (indicates UIPI denied SetForegroundWindow — daemon needs UIAccess).

`HotkeyTool::invoke` (in `tools/impl_.rs`) routes through this new path
when modifiers are present (the accelerator case). Plain non-modifier
keys keep using `post_key` (PostMessage) — they don't need modifier-state
propagation and PostMessage's no-focus-theft is preferable.

## Verification (Windows VM, latest main + uia worker at UIAccess)

| Target | Combo | Before | After |
|---|---|---|---|
| Notepad++ 8.9.5 (Win32 Scintilla, elevated) | Ctrl+S | silent no-op | ✅ Save As dialog opens |
| LibreOffice Writer 26.2.3.2 (Win32) | Ctrl+A | inserted literal "a" | ✅ SendInput posted |
| LibreOffice Writer 26.2.3.2 (Win32) | Ctrl+B | inserted literal "b" | ✅ SendInput posted |
| LibreOffice Writer 26.2.3.2 (Win32) | Ctrl+S | inserted literal "s" | ✅ Save As dialog opens |

## Trade-off

SendInput requires foreground focus. The uia worker (UIAccess) is exempt
from SetForegroundWindow restrictions, so this works transparently when
calls route through it. From a non-UIAccess daemon, SetForegroundWindow
would silently fail and the events land on the wrong window — surfaced
as an actionable error by the partial-insertion check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cendant + name-related process scan (closes #1615)

Apps whose top-level binary is a wrapper that re-execs into another
process and exits (GIMP's `gimp-3.exe` → `gimp-3.2.exe`; LibreOffice's
`swriter.exe` → `soffice.bin`) leave `launch_app` returning a pid that
never has a window — `windows: []` forever. Downstream tools that need
pid+window_id (every UI tool: list_windows, get_window_state, click,
type_text, hotkey, screenshot) can't be exercised because the caller
has nothing to target.

## Fix

After the existing 5×200ms window-resolution loop, if no window
materialized, fall back to scanning processes related to the launched
pid:
- `list_descendants(root_pid)` walks the process tree (BFS via
  `CreateToolhelp32Snapshot` + `parent_pid`) and returns all
  transitive children. Catches the GIMP case where the launcher spawns
  a child that we can follow via parentage.
- `related_processes(root_pid, exe_basename)` extends that with
  name-prefix matching after stripping `.exe` and trailing version
  digits. `gimp-3.exe` → prefix `gimp` matches `gimp-3.2.exe`. Catches
  apps whose descendants detach from the parent-pid tree.

For each candidate (excluding the launched pid we already tried), one
short retry (3×200ms) for window registration. First candidate with a
window wins; its pid becomes the response's `pid` so the caller
targets the real process going forward.

## Verification (Windows VM)

| Target | Reported before | Reported after |
|---|---|---|
| `swriter.exe` (LibreOffice Writer launcher) | pid=swriter stub, `windows: []` | **pid=2364 (soffice.bin), windows: [{title: "Untitled 1 — LibreOffice Writer", window_id: 590500}]** ✅ |
| `gimp-3.exe` (GIMP wrapper) | pid=launcher, `windows: []` | pid=launcher, `windows: []` — GIMP's cold-start is slower than our 5s budget; the fallback IS firing but finds no descendant with a window in time. Tracking as a separate issue: extend window-resolution timeout for known-slow launchers OR accept that GIMP's first launch needs an explicit `list_windows(pid)` poll. |

The mechanism is proven via LibreOffice. The GIMP case is a separate
timing issue, not a logic issue.

## Backward compatibility

- Callers who got `windows: []` before now get either the correct
  resolved pid + windows OR the same `windows: []` (no regression).
- The response's `pid` may be different from the literal launched pid
  when the fallback fires. Callers chaining `launch_app` → `list_windows`
  / `get_window_state` already use the returned pid, so they
  transparently follow the descendant.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…for known-slow launchers

After #1615, launch_app now follows launcher-stub pid chains via
descendant + name-related process scan. The retry budget (1 candidate
× 3 × 200ms = 600ms) was fine for LibreOffice (swriter → soffice.bin
within ~1s) but too short for slow launchers like GIMP, Blender,
Inkscape, Krita, FreeCAD that take 10-20s on first launch.

This patch:
- Detects known-slow launchers by exe-basename prefix and uses an
  extended retry budget (30 attempts per candidate vs 3) so the
  fallback can wait for the wrapper to spawn its child.
- Re-scans descendants in a loop with 500ms sleeps between scans, so
  new processes spawned during the wait get picked up too.
- Caps total wait at ~12s for slow launchers to keep launch_app from
  blocking forever on apps that never open a window (e.g. when the
  app is mid-init or hung).

## Caveat

The GIMP case on the dev VM doesn't resolve because the `gimp-3.exe`
process never spawns a child — same Calculator-style "process up but
no window" environment issue we saw earlier. The mechanism itself is
verified via LibreOffice (in the prior commit) — swriter.exe →
soffice.bin resolved within ~1s. On a healthy host GIMP would also
work via this extended budget.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Ignored Ignored May 21, 2026 5:21am

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

Fixes native Win32 hotkey accelerator failures and launcher-stub PID tracking by adding SendInput-based key synthesis with modifier state propagation, process-tree discovery utilities, and strategy selection in launch_app and hotkey tools.

Changes

Windows hotkey and launcher improvements

Layer / File(s) Summary
SendInput-based key synthesis
libs/cua-driver-rs/crates/platform-windows/src/input/keyboard.rs, libs/cua-driver-rs/crates/platform-windows/src/input/mod.rs
Win32 imports extended with SendInput and INPUT/KEYBDINPUT structures. New send_key_synthesized(hwnd, key, modifiers) validates HWND, resolves virtual-key codes, constructs modifier-down/key-down-up/modifier-up INPUT sequences, temporarily sets foreground window for delivery, sends events with error on partial insertion, and restores prior foreground with delays. Private key_input helper builds individual keyboard INPUT structs.
Process tree and name-based discovery utilities
libs/cua-driver-rs/crates/platform-windows/src/win32/apps.rs, libs/cua-driver-rs/crates/platform-windows/src/win32/mod.rs
list_descendants(root_pid) performs breadth-first traversal returning deduplicated child process IDs. related_processes(root_pid, exe_basename) adds processes by executable prefix match after descendants. strip_version_suffix(basename) normalizes launcher exe names by removing version tails. All three exported via win32 module.
Launch_app launcher-stub PID resolution
libs/cua-driver-rs/crates/platform-windows/src/tools/impl_.rs (lines 932–944, 960–1043)
Introduces resolved_pid tracking; after initial 5×200ms retry yields no windows, derives executable basename and scans related/descendant processes with extended attempt budgets for known-slow launchers (GIMP/Blender/Inkscape/Krita/FreeCAD). Updates pid and windows list when a candidate with windows is found before producing final response.
Hotkey tool strategy selection
libs/cua-driver-rs/crates/platform-windows/src/tools/impl_.rs (lines 1700–1726)
Non-XAML Win32 hotkey path selects synthesized input when modifier keys exist (propagates modifier state for accelerator dispatch) and PostMessage when no modifiers (text-input path). Result message reports chosen mechanism ("SendInput" vs "PostMessage").

Sequence Diagram(s)

Not generated (changes comprise multiple independent functional improvements without a single unified sequential flow across all components).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • trycua/cua#1611: Both PRs modify hotkey routing in tools/impl_.rs to support accelerator-style targets on Win32—this PR via SendInput synthesis with modifier propagation for native accelerators, and that PR via UIA InvokePattern/TogglePattern for XAML/UWP hosts.
  • trycua/cua#1544: Both PRs enhance launch_app PID determination in platform-windows/src/tools/impl_.rs—this PR via launcher-stub and descendant-process tracking, and that PR via AUMID/UWP activation paths.

Poem

🐰 A keystroke through the system queue,
Modifiers set, accelerators true—
No more silent presses lost to fate,
And launchers now find their child state.
SendInput shines where PostMessage failed,
The Windows hotkey quest unveiled!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(cua-driver-rs/windows): SendInput hotkey + launcher-stub pid chain' clearly summarizes the two main changes: SendInput-based hotkey fix and launcher-stub pid chain resolution.
Linked Issues check ✅ Passed All code changes directly address the requirements from issues #1614 and #1615: SendInput implementation for modifier state, launcher-stub fallback with descendant tracking, and extended retry budgets for slow launchers.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issues: keyboard input synthesis, process descendant tracking, and launch_app tool enhancements are all required for the stated fixes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sprint/cua-driver-rs-windows-sendinput-hotkey

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
libs/cua-driver-rs/crates/platform-windows/src/input/keyboard.rs (1)

315-318: 💤 Low value

Consider checking SetForegroundWindow return value before proceeding.

If SetForegroundWindow(target) fails (returns FALSE), the subsequent SendInput events will land on whatever window is currently foreground, not the intended target. This can happen even with UIAccess when another input event races ahead. Checking the return and bailing early would provide a clearer diagnostic.

🛡️ Proposed defensive check
         // Save & set foreground so SendInput lands on `target`.
         let prev_fg = GetForegroundWindow();
-        let _ = SetForegroundWindow(target);
+        if !SetForegroundWindow(target).as_bool() {
+            // SetForegroundWindow can fail if another app just grabbed foreground
+            // or if we lack UIAccess privileges. Warn but proceed anyway —
+            // the subsequent SendInput count check will catch the failure.
+            tracing::warn!(target: "hotkey", "SetForegroundWindow(0x{:x}) returned FALSE", hwnd);
+        }
         // Brief settle so the foreground swap is processed before we send.
         sleep(Duration::from_millis(8));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/cua-driver-rs/crates/platform-windows/src/input/keyboard.rs` around
lines 315 - 318, Check the BOOL return from SetForegroundWindow(target) after
calling it (the code around GetForegroundWindow, SetForegroundWindow, sleep) and
if it returns FALSE, log or return an error and avoid proceeding to SendInput so
inputs don't go to the wrong window; ensure you still attempt to restore the
previous foreground window (GetForegroundWindow result in prev_fg) when bailing
and include the failure detail in the diagnostic to aid debugging.
libs/cua-driver-rs/crates/platform-windows/src/win32/apps.rs (2)

99-112: ⚡ Quick win

Apply the same HashSet optimization here.

Similar to list_descendants, line 106 uses Vec::contains for deduplication, which has O(n) complexity. When scanning all processes for name matches, this could become a performance bottleneck with many running processes.

♻️ Proposed refactor using HashSet
+use std::collections::HashSet;
+
 pub fn related_processes(root_pid: u32, exe_basename: &str) -> Vec<u32> {
     let mut out = list_descendants(root_pid);
+    let mut seen: HashSet<u32> = out.iter().copied().collect();
     let prefix = strip_version_suffix(exe_basename);
     if !prefix.is_empty() {
         let all = list_processes();
         for p in &all {
             let p_prefix = strip_version_suffix(&p.name);
-            if p_prefix.eq_ignore_ascii_case(&prefix) && !out.contains(&p.pid) {
+            if p_prefix.eq_ignore_ascii_case(&prefix) && seen.insert(p.pid) {
                 out.push(p.pid);
             }
         }
     }
     out
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/cua-driver-rs/crates/platform-windows/src/win32/apps.rs` around lines 99
- 112, The function related_processes uses Vec::contains for deduplication which
is O(n); change it to use a HashSet for O(1) membership checks: initialize a
HashSet<u32> with the pids returned from list_descendants (out), iterate
list_processes(), compare strip_version_suffix(&p.name) to prefix, and insert
new pids into both the HashSet and the out Vec only when not present; keep the
same helpers (related_processes, list_descendants, strip_version_suffix,
list_processes) and ensure the final return is the Vec out.

66-79: ⚡ Quick win

Optimize deduplication with HashSet for O(1) lookups.

The current implementation uses Vec::contains (line 72) for deduplication, which is O(n) per check. For processes with many descendants or when the process snapshot is large, this results in O(n²) behavior.

Refactor to use a HashSet<u32> for O(1) contains checks while maintaining the arrival-ordered Vec for the return value.

♻️ Proposed refactor using HashSet
+use std::collections::HashSet;
+
 pub fn list_descendants(root_pid: u32) -> Vec<u32> {
     let all = list_processes();
     let mut result = vec![root_pid];
+    let mut seen = HashSet::new();
+    seen.insert(root_pid);
     let mut frontier = vec![root_pid];
     while let Some(parent) = frontier.pop() {
         for p in &all {
-            if p.parent_pid == parent && !result.contains(&p.pid) {
+            if p.parent_pid == parent && seen.insert(p.pid) {
                 result.push(p.pid);
                 frontier.push(p.pid);
             }
         }
     }
     result
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/cua-driver-rs/crates/platform-windows/src/win32/apps.rs` around lines 66
- 79, The current list_descendants function uses Vec::contains on result for
deduplication which is O(n); replace that with a HashSet<u32> (e.g., seen) to
get O(1) membership checks while still keeping the arrival-ordered Vec<u32>
result to return; initialize seen with root_pid, update seen.insert(p.pid)
whenever you push to result and frontier, and use seen.contains(&p.pid) instead
of result.contains(&p.pid); keep calling list_processes() as before and
otherwise preserve the BFS logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@libs/cua-driver-rs/crates/platform-windows/src/input/keyboard.rs`:
- Around line 315-318: Check the BOOL return from SetForegroundWindow(target)
after calling it (the code around GetForegroundWindow, SetForegroundWindow,
sleep) and if it returns FALSE, log or return an error and avoid proceeding to
SendInput so inputs don't go to the wrong window; ensure you still attempt to
restore the previous foreground window (GetForegroundWindow result in prev_fg)
when bailing and include the failure detail in the diagnostic to aid debugging.

In `@libs/cua-driver-rs/crates/platform-windows/src/win32/apps.rs`:
- Around line 99-112: The function related_processes uses Vec::contains for
deduplication which is O(n); change it to use a HashSet for O(1) membership
checks: initialize a HashSet<u32> with the pids returned from list_descendants
(out), iterate list_processes(), compare strip_version_suffix(&p.name) to
prefix, and insert new pids into both the HashSet and the out Vec only when not
present; keep the same helpers (related_processes, list_descendants,
strip_version_suffix, list_processes) and ensure the final return is the Vec
out.
- Around line 66-79: The current list_descendants function uses Vec::contains on
result for deduplication which is O(n); replace that with a HashSet<u32> (e.g.,
seen) to get O(1) membership checks while still keeping the arrival-ordered
Vec<u32> result to return; initialize seen with root_pid, update
seen.insert(p.pid) whenever you push to result and frontier, and use
seen.contains(&p.pid) instead of result.contains(&p.pid); keep calling
list_processes() as before and otherwise preserve the BFS logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b447e504-672f-4268-abe6-1467d342acf3

📥 Commits

Reviewing files that changed from the base of the PR and between f5d386e and ce21efe.

📒 Files selected for processing (5)
  • libs/cua-driver-rs/crates/platform-windows/src/input/keyboard.rs
  • libs/cua-driver-rs/crates/platform-windows/src/input/mod.rs
  • libs/cua-driver-rs/crates/platform-windows/src/tools/impl_.rs
  • libs/cua-driver-rs/crates/platform-windows/src/win32/apps.rs
  • libs/cua-driver-rs/crates/platform-windows/src/win32/mod.rs

@f-trycua f-trycua merged commit 5e9afd6 into main May 21, 2026
5 checks passed
f-trycua added a commit that referenced this pull request May 21, 2026
…ugh SendInput

`PostMessage(WM_LBUTTONDOWN/UP)` to Chromium-based browsers' frame HWND
(or Chrome_RenderWidgetHostHWND descendant) doesn't reach the DOM input
pipeline — Chromium's input thread only accepts events with
`SendInput`-queue origin (same architectural quirk that broke
modifier-state hotkey delivery in #1614/#1618). After #1621 stopped the
silent UIA Invoke reroute on canvases, x,y clicks on Chromium pages took
the PostMessage path and silently no-op'd the DOM event handlers.

## Fix

Add a third branch to `LeftClickTool::run`'s x,y dispatch, between UIA
Invoke (for coord-independent control types per #1621) and PostMessage
(for everything else):

1. UIA Invoke if `is_coord_independent_action(element)` — preserved.
2. **NEW**: if the target HWND is a Chromium frame, route through
   `send_click_synthesized` which uses `SendInput` against the system
   input queue. Surfaces an actionable error if it fails (typically
   non-UIAccess daemon — the call should land on the cua-driver-uia
   worker which already runs at UIAccess integrity).
3. PostMessage `post_click` otherwise — unchanged.

## New helpers (`crates/platform-windows/src/input/mouse.rs`)

- **`is_chromium_target_window(hwnd)`** — `GetClassNameW` check for
  `Chrome_WidgetWin_*` (covers all Chromium-based browsers: Edge,
  Chrome, Brave, Vivaldi, Opera, Arc, Thorium, Iridium, etc.) and
  `CefBrowser*` (Electron / CEF apps). Cheap call (~one `GetClassNameW`
  to a 64-byte buffer); suitable inline in the click dispatch path.
  Emits a `tracing::debug!(target="click")` line with the observed class
  name so future debugging can see what the function actually decided.

- **`send_click_synthesized(target, sx, sy, count, button)`** — mirror
  of `send_key_synthesized` for mouse input. Save previous foreground +
  cursor → `SetForegroundWindow(target)` (8ms settle) → `SetCursorPos` +
  `SendInput(MouseInputs)` → 40ms settle → restore previous foreground +
  cursor. Uses `MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_VIRTUALDESK`
  normalized coords so multi-monitor setups work correctly.

  Trade-offs: briefly steals foreground + visibly moves the cursor.
  There's no Chromium-native alternative that gets DOM events to fire
  without these tradeoffs short of `--remote-debugging-port` + CDP
  (separate work). The send_key_synthesized path makes the same
  trade-off for modifier-state hotkeys; this is the consistent answer.

## Why this branch ordering

UIA Invoke runs first (no focus steal). Per #1621 it only fires for
control types with coord-independent primary actions (Button, MenuItem,
Hyperlink, etc.) — so when the click lands on a Chromium *button* or
*link*, UIA Invoke wins and the user gets zero focus steal. Only when
UIA Invoke isn't viable (canvases, paint surfaces, image maps, custom
widgets) does the Chromium SendInput branch engage.

This means the common Chromium interactions (clicking buttons, links,
form controls) keep the no-focus-steal property. The focus steal +
cursor jump only happens when the user explicitly asks for pixel
precision on a custom-drawn surface — which is the exact case where
they care about coords reaching the underlying element.

## Verification

- `cargo check -p platform-windows` clean on the VM (2.44s incremental)
- `cargo build --release -p cua-driver -p cua-driver-uia` clean (27.33s)
- Pre-existing 8 unit tests under `chromium_flag_injection_tests` still pass
- **E2E (#1620 + #1621 + #1623 chain)**: `click(pid, x, y)` on the "Click Me"
  button in `test_page.html` loaded in Edge — page DOM now exposed via UIA
  (per #1620 auto-injection), UIA Invoke takes the path (per #1621 whitelist —
  Button is coord-independent), counter increments. The SendInput branch only
  engages when UIA Invoke can't, which is the canvas case.
- **Direct canvas verification deferred**: the canvas in `test_page.html`
  sits below the viewport in a 901px tall Edge window; verifying the
  SendInput path against a canvas requires the `scroll` tool which wasn't
  in the test harness allowlist. Structure verified through unit tests +
  the chain test above + the canvas's UIA control type (`Image`) being in
  the #1621 fall-through set.

## UIAccess constraint

`send_click_synthesized` requires the daemon to have UIAccess integrity
so `SetForegroundWindow` is permitted. When invoked from a non-UIAccess
daemon, the function surfaces the actionable error
`"SendInput inserted only 0 of 3 mouse events. Likely cause: the daemon
is not at UIAccess integrity, so SetForegroundWindow was rejected and the
events landed on the wrong window. Route Chromium coord clicks through
the cua-driver-uia worker."` — same template as `send_key_synthesized`.

The MCP proxy already auto-prefers the `cua-driver-uia` pipe over the
regular pipe when both are running (cli.rs:407-408), so Chromium coord
clicks on systems with the uia worker installed (the default) take the
SendInput path. Systems without the uia worker get the diagnostic.

Closes #1623.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
f-trycua added a commit that referenced this pull request May 21, 2026
…ugh SendInput (#1625)

`PostMessage(WM_LBUTTONDOWN/UP)` to Chromium-based browsers' frame HWND
(or Chrome_RenderWidgetHostHWND descendant) doesn't reach the DOM input
pipeline — Chromium's input thread only accepts events with
`SendInput`-queue origin (same architectural quirk that broke
modifier-state hotkey delivery in #1614/#1618). After #1621 stopped the
silent UIA Invoke reroute on canvases, x,y clicks on Chromium pages took
the PostMessage path and silently no-op'd the DOM event handlers.

## Fix

Add a third branch to `LeftClickTool::run`'s x,y dispatch, between UIA
Invoke (for coord-independent control types per #1621) and PostMessage
(for everything else):

1. UIA Invoke if `is_coord_independent_action(element)` — preserved.
2. **NEW**: if the target HWND is a Chromium frame, route through
   `send_click_synthesized` which uses `SendInput` against the system
   input queue. Surfaces an actionable error if it fails (typically
   non-UIAccess daemon — the call should land on the cua-driver-uia
   worker which already runs at UIAccess integrity).
3. PostMessage `post_click` otherwise — unchanged.

## New helpers (`crates/platform-windows/src/input/mouse.rs`)

- **`is_chromium_target_window(hwnd)`** — `GetClassNameW` check for
  `Chrome_WidgetWin_*` (covers all Chromium-based browsers: Edge,
  Chrome, Brave, Vivaldi, Opera, Arc, Thorium, Iridium, etc.) and
  `CefBrowser*` (Electron / CEF apps). Cheap call (~one `GetClassNameW`
  to a 64-byte buffer); suitable inline in the click dispatch path.
  Emits a `tracing::debug!(target="click")` line with the observed class
  name so future debugging can see what the function actually decided.

- **`send_click_synthesized(target, sx, sy, count, button)`** — mirror
  of `send_key_synthesized` for mouse input. Save previous foreground +
  cursor → `SetForegroundWindow(target)` (8ms settle) → `SetCursorPos` +
  `SendInput(MouseInputs)` → 40ms settle → restore previous foreground +
  cursor. Uses `MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_VIRTUALDESK`
  normalized coords so multi-monitor setups work correctly.

  Trade-offs: briefly steals foreground + visibly moves the cursor.
  There's no Chromium-native alternative that gets DOM events to fire
  without these tradeoffs short of `--remote-debugging-port` + CDP
  (separate work). The send_key_synthesized path makes the same
  trade-off for modifier-state hotkeys; this is the consistent answer.

## Why this branch ordering

UIA Invoke runs first (no focus steal). Per #1621 it only fires for
control types with coord-independent primary actions (Button, MenuItem,
Hyperlink, etc.) — so when the click lands on a Chromium *button* or
*link*, UIA Invoke wins and the user gets zero focus steal. Only when
UIA Invoke isn't viable (canvases, paint surfaces, image maps, custom
widgets) does the Chromium SendInput branch engage.

This means the common Chromium interactions (clicking buttons, links,
form controls) keep the no-focus-steal property. The focus steal +
cursor jump only happens when the user explicitly asks for pixel
precision on a custom-drawn surface — which is the exact case where
they care about coords reaching the underlying element.

## Verification

- `cargo check -p platform-windows` clean on the VM (2.44s incremental)
- `cargo build --release -p cua-driver -p cua-driver-uia` clean (27.33s)
- Pre-existing 8 unit tests under `chromium_flag_injection_tests` still pass
- **E2E (#1620 + #1621 + #1623 chain)**: `click(pid, x, y)` on the "Click Me"
  button in `test_page.html` loaded in Edge — page DOM now exposed via UIA
  (per #1620 auto-injection), UIA Invoke takes the path (per #1621 whitelist —
  Button is coord-independent), counter increments. The SendInput branch only
  engages when UIA Invoke can't, which is the canvas case.
- **Direct canvas verification deferred**: the canvas in `test_page.html`
  sits below the viewport in a 901px tall Edge window; verifying the
  SendInput path against a canvas requires the `scroll` tool which wasn't
  in the test harness allowlist. Structure verified through unit tests +
  the chain test above + the canvas's UIA control type (`Image`) being in
  the #1621 fall-through set.

## UIAccess constraint

`send_click_synthesized` requires the daemon to have UIAccess integrity
so `SetForegroundWindow` is permitted. When invoked from a non-UIAccess
daemon, the function surfaces the actionable error
`"SendInput inserted only 0 of 3 mouse events. Likely cause: the daemon
is not at UIAccess integrity, so SetForegroundWindow was rejected and the
events landed on the wrong window. Route Chromium coord clicks through
the cua-driver-uia worker."` — same template as `send_key_synthesized`.

The MCP proxy already auto-prefers the `cua-driver-uia` pipe over the
regular pipe when both are running (cli.rs:407-408), so Chromium coord
clicks on systems with the uia worker installed (the default) take the
SendInput path. Systems without the uia worker get the diagnostic.

Closes #1623.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
f-trycua added a commit that referenced this pull request May 21, 2026
… SendInput dispatch (#1629)

The hotkey tool description still claimed "Legacy Win32 targets receive
the combo directly via PostMessage(WM_KEYDOWN/UP)" — but #1614/#1618
changed the dispatch to use SendInput (via the cua-driver-uia worker)
for combos containing modifiers, because PostMessage doesn't update the
OS-wide modifier state and accelerators fail to fire on TranslateAccelerator-
based apps (LibreOffice, FAR, classic Notepad, etc.).

The actual current dispatch (impl_.rs::HotkeyTool::invoke, lines ~1859-1879):

  1. XAML / WinUI / UWP target  → UIA accelerator-key invocation
  2. Legacy Win32 WITH modifiers → SendInput (brief foreground swap, UIAccess required)
  3. Legacy Win32 WITHOUT modifiers → PostMessage WM_KEYDOWN/UP

Description now matches reality across all three branches, including the
UIAccess requirement and the cua-driver-uia worker proxy auto-preference.

Caught during Inkscape stress testing — agents reading the tool docs
would expect PostMessage-only behavior and be surprised by the visible
cursor / foreground change on modifier+key hotkeys, or by the
"SendInput inserted only 0 of 4 events" diagnostic when the daemon
lacks UIAccess.

The mcp-tools.mdx entry for hotkey is auto-generated from this Rust
source (per the AUTO-GENERATED comment at the top of mcp-tools.mdx), so
this fix regenerates the public docs on the next docs build. PR #1627
also has an inline cross-reference from the hotkey section to the
Windows behavior notes; the two land complementary improvements.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant