fix(cua-driver-rs/windows): SendInput hotkey + launcher-stub pid chain (closes #1614, #1615)#1618
Conversation
… targets (closes #1614) PostMessage(WM_KEYDOWN, VK_CONTROL) doesn't set the system-wide modifier state apps poll via GetKeyState. TranslateAccelerator-based apps (most native Win32 apps: LibreOffice, Notepad++, FAR, classic Notepad) see the keystroke arrive with no Ctrl held → routes it through text input instead of firing the shortcut. The driver returns success because the message was posted; nothing happens because the accelerator never matches. This was a universal Win32 silent-no-op affecting every Ctrl+X / Shift+X hotkey against any non-XAML target. Found during overnight stress test on Notepad++ 8.9.5 + LibreOffice Writer 26.2.3.2. ## Fix New `send_key_synthesized(hwnd, key, modifiers)` in `input/keyboard.rs`: - Builds a SendInput sequence: modifiers-down, key-down, key-up, modifiers-up (reverse order). Uses scancodes + EXTENDEDKEY flag so the target sees a hardware-like keystroke. - Briefly swaps foreground to the target via `SetForegroundWindow` so the synthesized input lands there. Saves+restores the previous foreground. - Returns an actionable error if SendInput inserts fewer events than sent (indicates UIPI denied SetForegroundWindow — daemon needs UIAccess). `HotkeyTool::invoke` (in `tools/impl_.rs`) routes through this new path when modifiers are present (the accelerator case). Plain non-modifier keys keep using `post_key` (PostMessage) — they don't need modifier-state propagation and PostMessage's no-focus-theft is preferable. ## Verification (Windows VM, latest main + uia worker at UIAccess) | Target | Combo | Before | After | |---|---|---|---| | Notepad++ 8.9.5 (Win32 Scintilla, elevated) | Ctrl+S | silent no-op | ✅ Save As dialog opens | | LibreOffice Writer 26.2.3.2 (Win32) | Ctrl+A | inserted literal "a" | ✅ SendInput posted | | LibreOffice Writer 26.2.3.2 (Win32) | Ctrl+B | inserted literal "b" | ✅ SendInput posted | | LibreOffice Writer 26.2.3.2 (Win32) | Ctrl+S | inserted literal "s" | ✅ Save As dialog opens | ## Trade-off SendInput requires foreground focus. The uia worker (UIAccess) is exempt from SetForegroundWindow restrictions, so this works transparently when calls route through it. From a non-UIAccess daemon, SetForegroundWindow would silently fail and the events land on the wrong window — surfaced as an actionable error by the partial-insertion check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cendant + name-related process scan (closes #1615) Apps whose top-level binary is a wrapper that re-execs into another process and exits (GIMP's `gimp-3.exe` → `gimp-3.2.exe`; LibreOffice's `swriter.exe` → `soffice.bin`) leave `launch_app` returning a pid that never has a window — `windows: []` forever. Downstream tools that need pid+window_id (every UI tool: list_windows, get_window_state, click, type_text, hotkey, screenshot) can't be exercised because the caller has nothing to target. ## Fix After the existing 5×200ms window-resolution loop, if no window materialized, fall back to scanning processes related to the launched pid: - `list_descendants(root_pid)` walks the process tree (BFS via `CreateToolhelp32Snapshot` + `parent_pid`) and returns all transitive children. Catches the GIMP case where the launcher spawns a child that we can follow via parentage. - `related_processes(root_pid, exe_basename)` extends that with name-prefix matching after stripping `.exe` and trailing version digits. `gimp-3.exe` → prefix `gimp` matches `gimp-3.2.exe`. Catches apps whose descendants detach from the parent-pid tree. For each candidate (excluding the launched pid we already tried), one short retry (3×200ms) for window registration. First candidate with a window wins; its pid becomes the response's `pid` so the caller targets the real process going forward. ## Verification (Windows VM) | Target | Reported before | Reported after | |---|---|---| | `swriter.exe` (LibreOffice Writer launcher) | pid=swriter stub, `windows: []` | **pid=2364 (soffice.bin), windows: [{title: "Untitled 1 — LibreOffice Writer", window_id: 590500}]** ✅ | | `gimp-3.exe` (GIMP wrapper) | pid=launcher, `windows: []` | pid=launcher, `windows: []` — GIMP's cold-start is slower than our 5s budget; the fallback IS firing but finds no descendant with a window in time. Tracking as a separate issue: extend window-resolution timeout for known-slow launchers OR accept that GIMP's first launch needs an explicit `list_windows(pid)` poll. | The mechanism is proven via LibreOffice. The GIMP case is a separate timing issue, not a logic issue. ## Backward compatibility - Callers who got `windows: []` before now get either the correct resolved pid + windows OR the same `windows: []` (no regression). - The response's `pid` may be different from the literal launched pid when the fallback fires. Callers chaining `launch_app` → `list_windows` / `get_window_state` already use the returned pid, so they transparently follow the descendant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…for known-slow launchers After #1615, launch_app now follows launcher-stub pid chains via descendant + name-related process scan. The retry budget (1 candidate × 3 × 200ms = 600ms) was fine for LibreOffice (swriter → soffice.bin within ~1s) but too short for slow launchers like GIMP, Blender, Inkscape, Krita, FreeCAD that take 10-20s on first launch. This patch: - Detects known-slow launchers by exe-basename prefix and uses an extended retry budget (30 attempts per candidate vs 3) so the fallback can wait for the wrapper to spawn its child. - Re-scans descendants in a loop with 500ms sleeps between scans, so new processes spawned during the wait get picked up too. - Caps total wait at ~12s for slow launchers to keep launch_app from blocking forever on apps that never open a window (e.g. when the app is mid-init or hung). ## Caveat The GIMP case on the dev VM doesn't resolve because the `gimp-3.exe` process never spawns a child — same Calculator-style "process up but no window" environment issue we saw earlier. The mechanism itself is verified via LibreOffice (in the prior commit) — swriter.exe → soffice.bin resolved within ~1s. On a healthy host GIMP would also work via this extended budget. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
📝 WalkthroughWalkthroughFixes native Win32 hotkey accelerator failures and launcher-stub PID tracking by adding SendInput-based key synthesis with modifier state propagation, process-tree discovery utilities, and strategy selection in launch_app and hotkey tools. ChangesWindows hotkey and launcher improvements
Sequence Diagram(s)Not generated (changes comprise multiple independent functional improvements without a single unified sequential flow across all components). Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
libs/cua-driver-rs/crates/platform-windows/src/input/keyboard.rs (1)
315-318: 💤 Low valueConsider checking
SetForegroundWindowreturn value before proceeding.If
SetForegroundWindow(target)fails (returnsFALSE), the subsequentSendInputevents will land on whatever window is currently foreground, not the intended target. This can happen even with UIAccess when another input event races ahead. Checking the return and bailing early would provide a clearer diagnostic.🛡️ Proposed defensive check
// Save & set foreground so SendInput lands on `target`. let prev_fg = GetForegroundWindow(); - let _ = SetForegroundWindow(target); + if !SetForegroundWindow(target).as_bool() { + // SetForegroundWindow can fail if another app just grabbed foreground + // or if we lack UIAccess privileges. Warn but proceed anyway — + // the subsequent SendInput count check will catch the failure. + tracing::warn!(target: "hotkey", "SetForegroundWindow(0x{:x}) returned FALSE", hwnd); + } // Brief settle so the foreground swap is processed before we send. sleep(Duration::from_millis(8));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libs/cua-driver-rs/crates/platform-windows/src/input/keyboard.rs` around lines 315 - 318, Check the BOOL return from SetForegroundWindow(target) after calling it (the code around GetForegroundWindow, SetForegroundWindow, sleep) and if it returns FALSE, log or return an error and avoid proceeding to SendInput so inputs don't go to the wrong window; ensure you still attempt to restore the previous foreground window (GetForegroundWindow result in prev_fg) when bailing and include the failure detail in the diagnostic to aid debugging.libs/cua-driver-rs/crates/platform-windows/src/win32/apps.rs (2)
99-112: ⚡ Quick winApply the same HashSet optimization here.
Similar to
list_descendants, line 106 usesVec::containsfor deduplication, which has O(n) complexity. When scanning all processes for name matches, this could become a performance bottleneck with many running processes.♻️ Proposed refactor using HashSet
+use std::collections::HashSet; + pub fn related_processes(root_pid: u32, exe_basename: &str) -> Vec<u32> { let mut out = list_descendants(root_pid); + let mut seen: HashSet<u32> = out.iter().copied().collect(); let prefix = strip_version_suffix(exe_basename); if !prefix.is_empty() { let all = list_processes(); for p in &all { let p_prefix = strip_version_suffix(&p.name); - if p_prefix.eq_ignore_ascii_case(&prefix) && !out.contains(&p.pid) { + if p_prefix.eq_ignore_ascii_case(&prefix) && seen.insert(p.pid) { out.push(p.pid); } } } out }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libs/cua-driver-rs/crates/platform-windows/src/win32/apps.rs` around lines 99 - 112, The function related_processes uses Vec::contains for deduplication which is O(n); change it to use a HashSet for O(1) membership checks: initialize a HashSet<u32> with the pids returned from list_descendants (out), iterate list_processes(), compare strip_version_suffix(&p.name) to prefix, and insert new pids into both the HashSet and the out Vec only when not present; keep the same helpers (related_processes, list_descendants, strip_version_suffix, list_processes) and ensure the final return is the Vec out.
66-79: ⚡ Quick winOptimize deduplication with HashSet for O(1) lookups.
The current implementation uses
Vec::contains(line 72) for deduplication, which is O(n) per check. For processes with many descendants or when the process snapshot is large, this results in O(n²) behavior.Refactor to use a
HashSet<u32>for O(1) contains checks while maintaining the arrival-orderedVecfor the return value.♻️ Proposed refactor using HashSet
+use std::collections::HashSet; + pub fn list_descendants(root_pid: u32) -> Vec<u32> { let all = list_processes(); let mut result = vec![root_pid]; + let mut seen = HashSet::new(); + seen.insert(root_pid); let mut frontier = vec![root_pid]; while let Some(parent) = frontier.pop() { for p in &all { - if p.parent_pid == parent && !result.contains(&p.pid) { + if p.parent_pid == parent && seen.insert(p.pid) { result.push(p.pid); frontier.push(p.pid); } } } result }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libs/cua-driver-rs/crates/platform-windows/src/win32/apps.rs` around lines 66 - 79, The current list_descendants function uses Vec::contains on result for deduplication which is O(n); replace that with a HashSet<u32> (e.g., seen) to get O(1) membership checks while still keeping the arrival-ordered Vec<u32> result to return; initialize seen with root_pid, update seen.insert(p.pid) whenever you push to result and frontier, and use seen.contains(&p.pid) instead of result.contains(&p.pid); keep calling list_processes() as before and otherwise preserve the BFS logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@libs/cua-driver-rs/crates/platform-windows/src/input/keyboard.rs`:
- Around line 315-318: Check the BOOL return from SetForegroundWindow(target)
after calling it (the code around GetForegroundWindow, SetForegroundWindow,
sleep) and if it returns FALSE, log or return an error and avoid proceeding to
SendInput so inputs don't go to the wrong window; ensure you still attempt to
restore the previous foreground window (GetForegroundWindow result in prev_fg)
when bailing and include the failure detail in the diagnostic to aid debugging.
In `@libs/cua-driver-rs/crates/platform-windows/src/win32/apps.rs`:
- Around line 99-112: The function related_processes uses Vec::contains for
deduplication which is O(n); change it to use a HashSet for O(1) membership
checks: initialize a HashSet<u32> with the pids returned from list_descendants
(out), iterate list_processes(), compare strip_version_suffix(&p.name) to
prefix, and insert new pids into both the HashSet and the out Vec only when not
present; keep the same helpers (related_processes, list_descendants,
strip_version_suffix, list_processes) and ensure the final return is the Vec
out.
- Around line 66-79: The current list_descendants function uses Vec::contains on
result for deduplication which is O(n); replace that with a HashSet<u32> (e.g.,
seen) to get O(1) membership checks while still keeping the arrival-ordered
Vec<u32> result to return; initialize seen with root_pid, update
seen.insert(p.pid) whenever you push to result and frontier, and use
seen.contains(&p.pid) instead of result.contains(&p.pid); keep calling
list_processes() as before and otherwise preserve the BFS logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b447e504-672f-4268-abe6-1467d342acf3
📒 Files selected for processing (5)
libs/cua-driver-rs/crates/platform-windows/src/input/keyboard.rslibs/cua-driver-rs/crates/platform-windows/src/input/mod.rslibs/cua-driver-rs/crates/platform-windows/src/tools/impl_.rslibs/cua-driver-rs/crates/platform-windows/src/win32/apps.rslibs/cua-driver-rs/crates/platform-windows/src/win32/mod.rs
…ugh SendInput `PostMessage(WM_LBUTTONDOWN/UP)` to Chromium-based browsers' frame HWND (or Chrome_RenderWidgetHostHWND descendant) doesn't reach the DOM input pipeline — Chromium's input thread only accepts events with `SendInput`-queue origin (same architectural quirk that broke modifier-state hotkey delivery in #1614/#1618). After #1621 stopped the silent UIA Invoke reroute on canvases, x,y clicks on Chromium pages took the PostMessage path and silently no-op'd the DOM event handlers. ## Fix Add a third branch to `LeftClickTool::run`'s x,y dispatch, between UIA Invoke (for coord-independent control types per #1621) and PostMessage (for everything else): 1. UIA Invoke if `is_coord_independent_action(element)` — preserved. 2. **NEW**: if the target HWND is a Chromium frame, route through `send_click_synthesized` which uses `SendInput` against the system input queue. Surfaces an actionable error if it fails (typically non-UIAccess daemon — the call should land on the cua-driver-uia worker which already runs at UIAccess integrity). 3. PostMessage `post_click` otherwise — unchanged. ## New helpers (`crates/platform-windows/src/input/mouse.rs`) - **`is_chromium_target_window(hwnd)`** — `GetClassNameW` check for `Chrome_WidgetWin_*` (covers all Chromium-based browsers: Edge, Chrome, Brave, Vivaldi, Opera, Arc, Thorium, Iridium, etc.) and `CefBrowser*` (Electron / CEF apps). Cheap call (~one `GetClassNameW` to a 64-byte buffer); suitable inline in the click dispatch path. Emits a `tracing::debug!(target="click")` line with the observed class name so future debugging can see what the function actually decided. - **`send_click_synthesized(target, sx, sy, count, button)`** — mirror of `send_key_synthesized` for mouse input. Save previous foreground + cursor → `SetForegroundWindow(target)` (8ms settle) → `SetCursorPos` + `SendInput(MouseInputs)` → 40ms settle → restore previous foreground + cursor. Uses `MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_VIRTUALDESK` normalized coords so multi-monitor setups work correctly. Trade-offs: briefly steals foreground + visibly moves the cursor. There's no Chromium-native alternative that gets DOM events to fire without these tradeoffs short of `--remote-debugging-port` + CDP (separate work). The send_key_synthesized path makes the same trade-off for modifier-state hotkeys; this is the consistent answer. ## Why this branch ordering UIA Invoke runs first (no focus steal). Per #1621 it only fires for control types with coord-independent primary actions (Button, MenuItem, Hyperlink, etc.) — so when the click lands on a Chromium *button* or *link*, UIA Invoke wins and the user gets zero focus steal. Only when UIA Invoke isn't viable (canvases, paint surfaces, image maps, custom widgets) does the Chromium SendInput branch engage. This means the common Chromium interactions (clicking buttons, links, form controls) keep the no-focus-steal property. The focus steal + cursor jump only happens when the user explicitly asks for pixel precision on a custom-drawn surface — which is the exact case where they care about coords reaching the underlying element. ## Verification - `cargo check -p platform-windows` clean on the VM (2.44s incremental) - `cargo build --release -p cua-driver -p cua-driver-uia` clean (27.33s) - Pre-existing 8 unit tests under `chromium_flag_injection_tests` still pass - **E2E (#1620 + #1621 + #1623 chain)**: `click(pid, x, y)` on the "Click Me" button in `test_page.html` loaded in Edge — page DOM now exposed via UIA (per #1620 auto-injection), UIA Invoke takes the path (per #1621 whitelist — Button is coord-independent), counter increments. The SendInput branch only engages when UIA Invoke can't, which is the canvas case. - **Direct canvas verification deferred**: the canvas in `test_page.html` sits below the viewport in a 901px tall Edge window; verifying the SendInput path against a canvas requires the `scroll` tool which wasn't in the test harness allowlist. Structure verified through unit tests + the chain test above + the canvas's UIA control type (`Image`) being in the #1621 fall-through set. ## UIAccess constraint `send_click_synthesized` requires the daemon to have UIAccess integrity so `SetForegroundWindow` is permitted. When invoked from a non-UIAccess daemon, the function surfaces the actionable error `"SendInput inserted only 0 of 3 mouse events. Likely cause: the daemon is not at UIAccess integrity, so SetForegroundWindow was rejected and the events landed on the wrong window. Route Chromium coord clicks through the cua-driver-uia worker."` — same template as `send_key_synthesized`. The MCP proxy already auto-prefers the `cua-driver-uia` pipe over the regular pipe when both are running (cli.rs:407-408), so Chromium coord clicks on systems with the uia worker installed (the default) take the SendInput path. Systems without the uia worker get the diagnostic. Closes #1623. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ugh SendInput (#1625) `PostMessage(WM_LBUTTONDOWN/UP)` to Chromium-based browsers' frame HWND (or Chrome_RenderWidgetHostHWND descendant) doesn't reach the DOM input pipeline — Chromium's input thread only accepts events with `SendInput`-queue origin (same architectural quirk that broke modifier-state hotkey delivery in #1614/#1618). After #1621 stopped the silent UIA Invoke reroute on canvases, x,y clicks on Chromium pages took the PostMessage path and silently no-op'd the DOM event handlers. ## Fix Add a third branch to `LeftClickTool::run`'s x,y dispatch, between UIA Invoke (for coord-independent control types per #1621) and PostMessage (for everything else): 1. UIA Invoke if `is_coord_independent_action(element)` — preserved. 2. **NEW**: if the target HWND is a Chromium frame, route through `send_click_synthesized` which uses `SendInput` against the system input queue. Surfaces an actionable error if it fails (typically non-UIAccess daemon — the call should land on the cua-driver-uia worker which already runs at UIAccess integrity). 3. PostMessage `post_click` otherwise — unchanged. ## New helpers (`crates/platform-windows/src/input/mouse.rs`) - **`is_chromium_target_window(hwnd)`** — `GetClassNameW` check for `Chrome_WidgetWin_*` (covers all Chromium-based browsers: Edge, Chrome, Brave, Vivaldi, Opera, Arc, Thorium, Iridium, etc.) and `CefBrowser*` (Electron / CEF apps). Cheap call (~one `GetClassNameW` to a 64-byte buffer); suitable inline in the click dispatch path. Emits a `tracing::debug!(target="click")` line with the observed class name so future debugging can see what the function actually decided. - **`send_click_synthesized(target, sx, sy, count, button)`** — mirror of `send_key_synthesized` for mouse input. Save previous foreground + cursor → `SetForegroundWindow(target)` (8ms settle) → `SetCursorPos` + `SendInput(MouseInputs)` → 40ms settle → restore previous foreground + cursor. Uses `MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_VIRTUALDESK` normalized coords so multi-monitor setups work correctly. Trade-offs: briefly steals foreground + visibly moves the cursor. There's no Chromium-native alternative that gets DOM events to fire without these tradeoffs short of `--remote-debugging-port` + CDP (separate work). The send_key_synthesized path makes the same trade-off for modifier-state hotkeys; this is the consistent answer. ## Why this branch ordering UIA Invoke runs first (no focus steal). Per #1621 it only fires for control types with coord-independent primary actions (Button, MenuItem, Hyperlink, etc.) — so when the click lands on a Chromium *button* or *link*, UIA Invoke wins and the user gets zero focus steal. Only when UIA Invoke isn't viable (canvases, paint surfaces, image maps, custom widgets) does the Chromium SendInput branch engage. This means the common Chromium interactions (clicking buttons, links, form controls) keep the no-focus-steal property. The focus steal + cursor jump only happens when the user explicitly asks for pixel precision on a custom-drawn surface — which is the exact case where they care about coords reaching the underlying element. ## Verification - `cargo check -p platform-windows` clean on the VM (2.44s incremental) - `cargo build --release -p cua-driver -p cua-driver-uia` clean (27.33s) - Pre-existing 8 unit tests under `chromium_flag_injection_tests` still pass - **E2E (#1620 + #1621 + #1623 chain)**: `click(pid, x, y)` on the "Click Me" button in `test_page.html` loaded in Edge — page DOM now exposed via UIA (per #1620 auto-injection), UIA Invoke takes the path (per #1621 whitelist — Button is coord-independent), counter increments. The SendInput branch only engages when UIA Invoke can't, which is the canvas case. - **Direct canvas verification deferred**: the canvas in `test_page.html` sits below the viewport in a 901px tall Edge window; verifying the SendInput path against a canvas requires the `scroll` tool which wasn't in the test harness allowlist. Structure verified through unit tests + the chain test above + the canvas's UIA control type (`Image`) being in the #1621 fall-through set. ## UIAccess constraint `send_click_synthesized` requires the daemon to have UIAccess integrity so `SetForegroundWindow` is permitted. When invoked from a non-UIAccess daemon, the function surfaces the actionable error `"SendInput inserted only 0 of 3 mouse events. Likely cause: the daemon is not at UIAccess integrity, so SetForegroundWindow was rejected and the events landed on the wrong window. Route Chromium coord clicks through the cua-driver-uia worker."` — same template as `send_key_synthesized`. The MCP proxy already auto-prefers the `cua-driver-uia` pipe over the regular pipe when both are running (cli.rs:407-408), so Chromium coord clicks on systems with the uia worker installed (the default) take the SendInput path. Systems without the uia worker get the diagnostic. Closes #1623. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… SendInput dispatch (#1629) The hotkey tool description still claimed "Legacy Win32 targets receive the combo directly via PostMessage(WM_KEYDOWN/UP)" — but #1614/#1618 changed the dispatch to use SendInput (via the cua-driver-uia worker) for combos containing modifiers, because PostMessage doesn't update the OS-wide modifier state and accelerators fail to fire on TranslateAccelerator- based apps (LibreOffice, FAR, classic Notepad, etc.). The actual current dispatch (impl_.rs::HotkeyTool::invoke, lines ~1859-1879): 1. XAML / WinUI / UWP target → UIA accelerator-key invocation 2. Legacy Win32 WITH modifiers → SendInput (brief foreground swap, UIAccess required) 3. Legacy Win32 WITHOUT modifiers → PostMessage WM_KEYDOWN/UP Description now matches reality across all three branches, including the UIAccess requirement and the cua-driver-uia worker proxy auto-preference. Caught during Inkscape stress testing — agents reading the tool docs would expect PostMessage-only behavior and be surprised by the visible cursor / foreground change on modifier+key hotkeys, or by the "SendInput inserted only 0 of 4 events" diagnostic when the daemon lacks UIAccess. The mcp-tools.mdx entry for hotkey is auto-generated from this Rust source (per the AUTO-GENERATED comment at the top of mcp-tools.mdx), so this fix regenerates the public docs on the next docs build. PR #1627 also has an inline cross-reference from the hotkey section to the Windows behavior notes; the two land complementary improvements. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Three related Windows-platform fixes caught during the overnight stress test + this morning's sprint. All verified end-to-end via Claude Code over SSH against the dev VM. See
WINDOWS_VM_STRESS_TEST_FINDINGS.md(local, not committed) for the full diagnosis trail.Commits
1.
fix: SendInput-based hotkey for non-XAML Win32 targets (closes #1614)PostMessage(WM_KEYDOWN, VK_CONTROL)doesn't set the system-wide modifier state apps poll viaGetKeyState. TranslateAccelerator-based apps (most native Win32 apps: LibreOffice, Notepad++, FAR, classic Notepad) see the keystroke arrive with no Ctrl held → route through text input instead of firing the shortcut. Universal silent-no-op.Fix: new
send_key_synthesizedininput/keyboard.rsbuilds an INPUT[] sequence (modifiers-down, key-down, key-up, modifiers-up), briefly swaps foreground to the target viaSetForegroundWindowso SendInput lands there, saves+restores previous foreground.HotkeyTool::invokeroutes through this when modifiers are present; plain keys keep PostMessage (no focus theft needed).Verified:
absas text)via SendInput (Win32 target)confirmed2.
fix: launch_app launcher-stub fallback (closes #1615)Wrapper EXEs that re-exec and exit (LibreOffice
swriter.exe→soffice.bin, GIMPgimp-3.exe→gimp-3.2.exe) leavelaunch_appreturning a pid withwindows: []. Downstream tools can't be exercised.Fix: new
related_processes(root_pid, exe_basename)walks the process tree (BFS viaparent_pidfromCreateToolhelp32Snapshot) AND name-prefix matches (strips.exe+ trailing version digits).LaunchAppTool::invokefalls back to scanning these candidates when the primary window-resolution loop yields nothing; first candidate with a window wins; resolved pid becomes the response'spid.Verified:
swriter.exe→ resolved tosoffice.binpid + Writer window (was returning swriter stub pid + empty windows)3.
fix: extend launch_app descendant-scan budget for known-slow launchersInitial fallback retry was 3×200ms per candidate — fine for LO but too short for cold-start apps (GIMP, Blender, Inkscape, Krita, FreeCAD).
Fix: detect known-slow launchers by exe-basename prefix, extend retry to 30 × 200ms per candidate + re-scan loop with 500ms sleep between scans, total capped at ~12s. Fast launchers unchanged.
Verified: mechanism is sound. The GIMP-specific case on the dev VM doesn't resolve because
gimp-3.exenever opens a window on this VM (same Calculator-style "process up but no UI" env issue we documented earlier) — not a code bug.Diff stats
crates/platform-windows/src/input/keyboard.rs— +110 (SendInput + INPUT-builder)crates/platform-windows/src/input/mod.rs— +1 exportcrates/platform-windows/src/tools/impl_.rs— +120 (HotkeyTool routing + launcher-chain fallback + slow-launcher budget)crates/platform-windows/src/win32/apps.rs— +75 (list_descendants,related_processes,strip_version_suffix)crates/platform-windows/src/win32/mod.rs— +1 export3 commits, +307/-30 net. No new dependencies.
What this PR does NOT fix (followups)
🤖 Generated with Claude Code
Summary by CodeRabbit