Skip to content

feat(computer): main-thread synthetic-input executor + CEF crash fix (1/8 of #3307)#3340

Merged
M3gA-Mind merged 3 commits into
tinyhumansai:mainfrom
M3gA-Mind:feat/voice-split-1-input
Jun 4, 2026
Merged

feat(computer): main-thread synthetic-input executor + CEF crash fix (1/8 of #3307)#3340
M3gA-Mind merged 3 commits into
tinyhumansai:mainfrom
M3gA-Mind:feat/voice-split-1-input

Conversation

@M3gA-Mind
Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind commented Jun 4, 2026

Summary

Slice 1/8 of #3307 — computer-control input primitives + the CEF crash fix.

  • Runs enigo keyboard/mouse on the app main thread via a native-registry executor. enigo's macOS TSMGetInputSourceProperty traps off-thread (SIGTRAP) and crashes the CEF host; dispatching through run_on_main_thread fixes it.
  • Adds the mouse / keyboard tools and the main_thread bridge (INPUT_ON_MAIN_THREAD_METHOD).
  • Downscales screenshot output so the model can actually see it.

Files (6)

tools/impl/computer/{mouse,keyboard,main_thread,mod}.rs, tools/impl/browser/screenshot.rs, app/src-tauri/src/lib.rs (main-thread executor registration only).

Part of the #3307 split. PR 3307 (72 files) is being replaced by 7 small, dependency-ordered PRs (merge-train). Branches are stacked; each PR's true slice is shown once its predecessors merge and it is rebased onto main.
Foundation slice — no dependency on later slices.

Summary by CodeRabbit

  • New Features

    • Screenshots that exceed size limits are now automatically downscaled and embedded as inline JPEGs, with reported shown dimensions.
  • Bug Fixes

    • Oversized screenshots no longer get truncated or skipped.
    • Input action failures now return clearer, consistent error messages.
  • Refactor

    • Keyboard and mouse input now run on the app main thread for improved reliability and stability.
  • Documentation

    • Added docs describing the synthetic-input main-thread executor and its availability.

Run enigo keyboard/mouse on the app main thread via a native-registry
executor; enigo's macOS TSMGetInputSourceProperty traps off-thread and
crashes the CEF host. Adds mouse/keyboard tools, the main_thread bridge,
and downscaled screenshots so the model can see them.

Slice 1/7 of tinyhumansai#3307 (was the 'computer control' area).
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a main-thread executor for synthetic input (registered at Tauri startup) and routes keyboard/mouse operations through it; replaces large-screenshot base64 truncation with blocking downscaling to inline JPEG data-URLs and updates tests.

Changes

Main Thread Input Execution

Layer / File(s) Summary
Main thread bridge contract and Tauri registration
src/openhuman/tools/impl/computer/main_thread.rs, src/openhuman/tools/impl/computer/mod.rs, app/src-tauri/src/lib.rs
Introduces MainThreadInputOp, INPUT_ON_MAIN_THREAD_METHOD, and run_input_on_main; re-exports them and registers a Tauri native handler that runs boxed FnOnce closures on the app main thread and returns their Result via a tokio::oneshot channel.
Keyboard tool main-thread refactoring
src/openhuman/tools/impl/computer/keyboard.rs
Refactors type, press, and hotkey to use run_input_on_main instead of spawn_blocking, converts errors to Result<String, String>, and centralizes logging/result mapping in a new private into_result helper.
Mouse tool main-thread refactoring
src/openhuman/tools/impl/computer/mouse.rs
Routes move, click, double_click, drag, and scroll through run_input_on_main, standardizes error mapping to String and logging via into_result, and preserves drag's best-effort release cleanup.

Screenshot Image Optimization

Layer / File(s) Summary
Screenshot downscaling and JPEG encoding
src/openhuman/tools/impl/browser/screenshot.rs
Removes MAX_BASE64_BYTES truncation and replaces read_and_encode with logic that either returns a direct data-URL for on-budget screenshots or spawns a blocking downscale_to_jpeg job to produce an inline JPEG that fits the byte budget. Tests updated to verify downscaling and "Downscaled to ..." messaging.

Sequence Diagram(s)

sequenceDiagram
  participant Core
  participant TauriShell
  participant AppMainThread
  participant Enigo_macOS
  Core->>TauriShell: request `computer.input_on_main_thread`(MainThreadInputOp)
  TauriShell->>AppMainThread: run_on_main_thread(op.run)
  AppMainThread->>Enigo_macOS: perform synthetic input operations
  Enigo_macOS-->>AppMainThread: Result<String, String>
  AppMainThread-->>TauriShell: Result<String, String>
  TauriShell-->>Core: Result<String, String>
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • senamakel
  • graycyrus

Poem

🐰 I hop where threads must safely run,
Keys and clicks beneath the sun,
Big screenshots shrink to JPEG light,
Main-thread dances smooth and tight,
Hooray — inputs hum, the tasks are done!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: adding a main-thread synthetic-input executor for computer control (keyboard/mouse on main thread) and a CEF crash fix (screenshot downscaling). It is concise, specific, and clearly summarizes the primary contributions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jun 4, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/openhuman/tools/impl/browser/screenshot.rs (1)

170-183: 💤 Low value

Consider adding debug logging for the downscale path.

The new downscale logic handles multiple fallback dimensions and has several decision points, but no diagnostic logging. Adding debug! or trace! statements would help troubleshoot issues in production (e.g., why a particular dimension was selected, original vs final sizes).

+        log::debug!(
+            "screenshot: oversized capture ({} bytes), downscaling...",
+            bytes.len()
+        );
         match tokio::task::spawn_blocking(move || downscale_to_jpeg(&bytes, MAX_RAW_BYTES)).await {
             Ok(Ok((jpeg, w, h))) => {
+                log::debug!("screenshot: downscaled to {}x{} ({} bytes)", w, h, jpeg.len());
                 Ok(Self::data_url_result(

As per coding guidelines: "Add substantial debug-level logs while implementing features or fixes in Rust using log / tracing crate with stable prefixes and correlation fields."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/browser/screenshot.rs` around lines 170 - 183, The
downscale path invoked via tokio::task::spawn_blocking calling
downscale_to_jpeg(&bytes, MAX_RAW_BYTES) lacks diagnostic logs; add debug-level
tracing (e.g., tracing::debug! or debug!) around the downscale flow in
screenshot.rs: log the original bytes length, MAX_RAW_BYTES, chosen output
dimension(s) (w,h) returned by downscale_to_jpeg, which fallback branch was
taken, and any error messages (include the error variable e) with a stable
prefix and correlation field (e.g., "screenshot::downscale" and a request_id or
path) so that the Ok(Ok((jpeg,w,h))) branch logs final dimensions and Ok(Err(e))
/ Err(e) branches log the failure details before returning ToolResult via
data_url_result or ToolResult::success/error.
src/openhuman/tools/impl/computer/main_thread.rs (1)

33-48: ⚡ Quick win

Add debug logs around main-thread dispatch and failure branches.

This bridge is a critical cross-module dispatch path, but it currently has no trace/debug visibility for dispatch attempts or failures.

Proposed patch
 pub async fn run_input_on_main<F>(op: F) -> Result<String, String>
 where
     F: FnOnce() -> Result<String, String> + Send + 'static,
 {
+    log::debug!("[computer-main-thread] dispatching synthetic input op");
     let req = MainThreadInputOp { run: Box::new(op) };
     match request_native_global::<MainThreadInputOp, Result<String, String>>(
         INPUT_ON_MAIN_THREAD_METHOD,
         req,
@@
     .await
     {
-        Ok(inner) => inner,
-        Err(e) => Err(format!(
-            "synthetic input requires the desktop app's main-thread executor (unavailable: {e})"
-        )),
+        Ok(inner) => {
+            if let Err(err) = &inner {
+                log::debug!("[computer-main-thread] main-thread input op failed: {err}");
+            }
+            inner
+        }
+        Err(e) => {
+            log::debug!("[computer-main-thread] dispatch unavailable: {e}");
+            Err(format!(
+                "synthetic input requires the desktop app's main-thread executor (unavailable: {e})"
+            ))
+        }
     }
 }

As per coding guidelines: "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/computer/main_thread.rs` around lines 33 - 48, The
run_input_on_main bridge currently has no logging; add trace/debug logs around
the dispatch and failure branches: log before calling request_native_global
(include INPUT_ON_MAIN_THREAD_METHOD and that a MainThreadInputOp is being
dispatched), log on successful Ok(inner) return (including a stable prefix and
the returned value or summary), and log on Err(e) with the full error message
(use same grep-friendly prefix). Update the function run_input_on_main to emit
these logs using the existing logging facility, referencing MainThreadInputOp,
request_native_global, and INPUT_ON_MAIN_THREAD_METHOD so dispatch attempts,
successes, and failures are visible.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src-tauri/src/lib.rs`:
- Around line 2816-2818: Document the newly registered native IPC method
register_native_global::<MainThreadInputOp, Result<String, String>, _,
_>(INPUT_ON_MAIN_THREAD_METHOD) in the Tauri shell architecture doc
(tauri-shell.md): add a short purpose/summary for INPUT_ON_MAIN_THREAD_METHOD,
specify the exact request shape (fields/types) and the response shape
(Result<String, String> success/error format), mark that this is a
desktop-only/native IPC surface, and include a minimal example request and
expected response to illustrate usage and error semantics.

In `@src/openhuman/tools/impl/computer/keyboard.rs`:
- Around line 190-201: The current closure passed to run_input_on_main (invoked
in the type path calling Enigo::new and enigo.text within into_result) holds the
Tauri main thread for the entire text/Hotkey sequence; refactor so only the
minimal Enigo call runs on the main thread: split long text or hotkey combos
into small chunks (e.g., per-character or small-buffer), perform all sequencing,
delays (HOTKEY_INTER_KEY_DELAY) and async sleeps off the main thread, and invoke
run_input_on_main/run_input_on_main-like helper repeatedly for each chunk to
call Enigo::new() once per dispatch and execute only the short
enigo.text(&chunk) action on the main thread. Apply the same chunked dispatch
pattern to the similar block referenced around lines 286-321.

In `@src/openhuman/tools/impl/computer/mouse.rs`:
- Around line 230-239: The closure passed to run_input_on_main currently
constructs Enigo and calls humanized_move (and similar humanized_* functions) on
the app main thread, which blocks UI; instead, compute the waypoint sequence and
pacing off-thread and only perform the minimal Enigo calls on the main thread.
Concretely: move creation of the waypoint list and sleep/timing logic out of the
run_input_on_main closure (call humanized_move or a new helper that returns
Vec<(x,y,delay)> from the worker thread), then inside the run_input_on_main
closure create Enigo (Enigo::new) and iterate the precomputed waypoints applying
enigo.mouse_move_to / button events without sleeping; for pacing use a
background timer that schedules main-thread enigo calls or marshal each enigo
call back to the main thread with short, non-blocking hops. Apply the same
approach to the other usages referenced (humanized_click,
humanized_double_click, humanized_drag) so only the actual Enigo operations run
on the main thread.

---

Nitpick comments:
In `@src/openhuman/tools/impl/browser/screenshot.rs`:
- Around line 170-183: The downscale path invoked via
tokio::task::spawn_blocking calling downscale_to_jpeg(&bytes, MAX_RAW_BYTES)
lacks diagnostic logs; add debug-level tracing (e.g., tracing::debug! or debug!)
around the downscale flow in screenshot.rs: log the original bytes length,
MAX_RAW_BYTES, chosen output dimension(s) (w,h) returned by downscale_to_jpeg,
which fallback branch was taken, and any error messages (include the error
variable e) with a stable prefix and correlation field (e.g.,
"screenshot::downscale" and a request_id or path) so that the Ok(Ok((jpeg,w,h)))
branch logs final dimensions and Ok(Err(e)) / Err(e) branches log the failure
details before returning ToolResult via data_url_result or
ToolResult::success/error.

In `@src/openhuman/tools/impl/computer/main_thread.rs`:
- Around line 33-48: The run_input_on_main bridge currently has no logging; add
trace/debug logs around the dispatch and failure branches: log before calling
request_native_global (include INPUT_ON_MAIN_THREAD_METHOD and that a
MainThreadInputOp is being dispatched), log on successful Ok(inner) return
(including a stable prefix and the returned value or summary), and log on Err(e)
with the full error message (use same grep-friendly prefix). Update the function
run_input_on_main to emit these logs using the existing logging facility,
referencing MainThreadInputOp, request_native_global, and
INPUT_ON_MAIN_THREAD_METHOD so dispatch attempts, successes, and failures are
visible.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3849692f-1e2e-448e-ba07-c42b9f1b4443

📥 Commits

Reviewing files that changed from the base of the PR and between 87a91ae and bf8bc2d.

📒 Files selected for processing (6)
  • app/src-tauri/src/lib.rs
  • src/openhuman/tools/impl/browser/screenshot.rs
  • src/openhuman/tools/impl/computer/keyboard.rs
  • src/openhuman/tools/impl/computer/main_thread.rs
  • src/openhuman/tools/impl/computer/mod.rs
  • src/openhuman/tools/impl/computer/mouse.rs

Comment thread app/src-tauri/src/lib.rs
Comment thread src/openhuman/tools/impl/computer/keyboard.rs
Comment thread src/openhuman/tools/impl/computer/mouse.rs
- Document the computer.input_on_main_thread native-registry method in
  gitbooks/developing/architecture/tauri-shell.md (CodeRabbit).
- Unit-test run_input_on_main Ok/Err passthrough via a registered executor.
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 4, 2026
JPEG has no alpha channel, so an RGBA capture (PNG screenshots often carry
one) would fail to encode and return no inline image, leaving vision-driven
control blind. Convert the thumbnail to RGB first; add an RGBA regression test.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/tools/impl/browser/screenshot.rs (1)

209-229: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add debug-level logging for downscale attempts and outcome.

The downscale_to_jpeg function implements iterative downscaling but lacks instrumentation. As per coding guidelines, new Rust features should include debug/trace logs for entry/exit, branch decisions, and outcomes with stable grep-friendly prefixes.

Consider logging:

  • Entry: input byte size
  • Each iteration: attempted max_dim, resulting JPEG byte count
  • Outcome: which dimension succeeded or fallback return

This will make it easy to trace why a particular downscale dimension was chosen and diagnose inline-budget issues.

📊 Example logging additions
 fn downscale_to_jpeg(bytes: &[u8], max_bytes: usize) -> Result<(Vec<u8>, u32, u32), String> {
+    log::debug!("downscale_to_jpeg: input {} bytes, budget {} bytes", bytes.len(), max_bytes);
     let img = image::load_from_memory(bytes).map_err(|e| format!("decode: {e}"))?;
     let mut last: Option<(Vec<u8>, u32, u32)> = None;
     for max_dim in [1568u32, 1280, 1024, 768, 600] {
         let thumb = img.thumbnail(max_dim, max_dim).to_rgb8();
         let (w, h) = (thumb.width(), thumb.height());
         let mut buf = std::io::Cursor::new(Vec::new());
         image::codecs::jpeg::JpegEncoder::new_with_quality(&mut buf, 72)
             .encode_image(&thumb)
             .map_err(|e| format!("jpeg encode: {e}"))?;
         let out = buf.into_inner();
+        log::debug!("downscale_to_jpeg: tried {}×{} → {} bytes", w, h, out.len());
         if out.len() <= max_bytes {
+            log::debug!("downscale_to_jpeg: success at {}×{}, {} bytes", w, h, out.len());
             return Ok((out, w, h));
         }
         last = Some((out, w, h));
     }
+    if let Some((_, w, h)) = &last {
+        log::debug!("downscale_to_jpeg: fallback to smallest attempt {}×{}", w, h);
+    }
     last.ok_or_else(|| "could not produce a fitting JPEG".to_string())
 }

As per coding guidelines: "Add substantial debug-level logs while implementing features or fixes in Rust using log / tracing crate with stable prefixes and correlation fields."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/browser/screenshot.rs` around lines 209 - 229, Add
debug-level instrumentation to downscale_to_jpeg: at function entry log the
input bytes length (e.g., "screenshot:downscale entry bytes=%d"); inside the for
loop log the attempted max_dim before thumbnailing and log the resulting JPEG
size and dimensions after encoding (e.g., "screenshot:downscale attempt
max_dim=%d out_bytes=%d w=%d h=%d"); when returning early on success log the
chosen max_dim/dimensions with a stable prefix (e.g., "screenshot:downscale
success ..."); and when falling back to last or error log the final decision
("screenshot:downscale fallback out_bytes=%d w=%d h=%d" or "screenshot:downscale
error ..."). Use the project's tracing/log crate, debug level, and keep the
prefix "screenshot:downscale" so grep can find all entries; add logs at the
start, after each image::codecs::jpeg::JpegEncoder encode_image call, on the Ok
return, and on the err/fallback path (referencing function name
downscale_to_jpeg, variable max_dim, out, w, h, and last).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/openhuman/tools/impl/browser/screenshot.rs`:
- Around line 209-229: Add debug-level instrumentation to downscale_to_jpeg: at
function entry log the input bytes length (e.g., "screenshot:downscale entry
bytes=%d"); inside the for loop log the attempted max_dim before thumbnailing
and log the resulting JPEG size and dimensions after encoding (e.g.,
"screenshot:downscale attempt max_dim=%d out_bytes=%d w=%d h=%d"); when
returning early on success log the chosen max_dim/dimensions with a stable
prefix (e.g., "screenshot:downscale success ..."); and when falling back to last
or error log the final decision ("screenshot:downscale fallback out_bytes=%d
w=%d h=%d" or "screenshot:downscale error ..."). Use the project's tracing/log
crate, debug level, and keep the prefix "screenshot:downscale" so grep can find
all entries; add logs at the start, after each image::codecs::jpeg::JpegEncoder
encode_image call, on the Ok return, and on the err/fallback path (referencing
function name downscale_to_jpeg, variable max_dim, out, w, h, and last).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 83d09e2c-27e7-4d10-afcf-834bd3c76ee0

📥 Commits

Reviewing files that changed from the base of the PR and between b47ef0c and 47cadbc.

📒 Files selected for processing (1)
  • src/openhuman/tools/impl/browser/screenshot.rs

Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Foundation slice of the computer-control EPIC — the main-thread SIGTRAP fix and screenshot downscale are solid, but the synthetic-input tools aren't approval-gated, which blocks this.

Blocker — mouse/keyboard execute without the ApprovalGate

They're default-off (registered only under computer_control.enabled, good), but neither tool overrides external_effect_with_args, so it inherits the trait default false. The ApprovalGate is triggered exclusively by external_effect_with_args (agent/harness/engine/tools.rs:162) — PermissionLevel::Dangerous does not trigger it (it's only a static channel-capability filter). So once computer_control.enabled=true, raw coordinate clicks and arbitrary keystrokes run with no approval prompt, including on background/auto-approved turns, with no sensitive-app denylist (strictly more dangerous than ax_interact, which at least has SENSITIVE_APPS). Keystrokes can land in a focused sudo/password field or Terminal unattended. This misses the bar #3168 set (opt-in and approval-gated).

Fix: add fn external_effect(&self) -> bool { true } to both MouseTool and KeyboardTool (every action mutates the host — no per-arg discrimination needed). Coordinate input can't denylist by app, so the gate is the only real boundary here.

Note: the user_filter.rs comment "PermissionLevel::Dangerous so the approval gate fires per-action" is factually wrong (Dangerous doesn't fire the gate), and ax_interact.rs's comment claims it "mirrors the opt-in posture of the mouse/keyboard tools" — i.e. ax_interact's author believed these were already gated. Worth correcting both while here.

Major — main-thread executor blocks the UI thread

The enigo-on-main-thread dispatch correctly fixes the TSMGetInputSourceProperty SIGTRAP, but run_on_main_thread runs the entire op on-main, including its std::thread::sleeps (hotkey 20ms x N keys; humanized-move per-waypoint dwells — hundreds of ms on a full-screen drag). That blocks the CEF/UI main thread for the full duration of every synthetic-input op → UI jank on an always-on path. Keep only the macOS layout lookup on-main and run the movement/sleeps off-thread; and wrap the closure in catch_unwind so an enigo FFI panic can't unwind on the main thread.

Clean

Screenshot downscale is correct (aspect-ratio preserved, RGBA handled, no panic on tiny/huge, no new info leak). Headless path fails closed. No-receiver case returns a clean cancel error.

Nit: mouse coordinates are bounded to 0..=32768, not actual screen/monitor bounds — fine as a sanity guard, but underscores that the ApprovalGate is the real safety boundary.

This is the foundation slice the rest of the stack builds on (slice 3/#3342 wires these to the agent), so the gating should be fixed here before the train rolls.

.await?
into_result(
"type",
run_input_on_main(move || {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker (gating): this dispatches arbitrary keystrokes, but KeyboardTool never overrides external_effect_with_args, so it inherits false and never routes through the ApprovalGate (the gate fires only on external_effect_with_args at engine/tools.rs:162 — PermissionLevel::Dangerous does NOT trigger it). With computer_control.enabled=true on a background/auto-approved turn, type/press/hotkey can type into a focused sudo/password field or Terminal unattended, and there's no sensitive-app denylist here (unlike ax_interact's SENSITIVE_APPS). Add fn external_effect(&self) -> bool { true } to KeyboardTool.

.await?
into_result(
"move",
run_input_on_main(move || {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker (gating): same issue as keyboard — MouseTool doesn't override external_effect_with_args, so blind coordinate clicks execute without the ApprovalGate. Raw mouse input has no app/element scoping (can't denylist), so the gate is the only real boundary. Add fn external_effect(&self) -> bool { true } to MouseTool. (Also: the user_filter.rs comment claiming Dangerous fires the gate is incorrect — worth fixing.)

Comment thread app/src-tauri/src/lib.rs
let (tx, rx) = tokio::sync::oneshot::channel();
let run = req.run;
input_app
.run_on_main_thread(move || {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major: this runs the whole enigo op on the main thread, including its std::thread::sleeps (hotkey inter-key delays; humanized-move per-waypoint dwells — hundreds of ms on a big drag), blocking the CEF/UI main thread for the op duration → input jank on an always-on path. Consider keeping only the macOS layout lookup on-main and running movement/sleeps off-thread; and wrap (run)() in std::panic::catch_unwind so an enigo FFI panic can't unwind on the main thread (currently it'd drop tx → caller sees a clean cancel, but the main-thread panic itself is the risk).

@M3gA-Mind M3gA-Mind merged commit e3ebaca into tinyhumansai:main Jun 4, 2026
22 checks passed
@M3gA-Mind M3gA-Mind changed the title feat(computer): main-thread synthetic-input executor + CEF crash fix (1/7 of #3307) feat(computer): main-thread synthetic-input executor + CEF crash fix (1/8 of #3307) Jun 4, 2026
@M3gA-Mind
Copy link
Copy Markdown
Contributor Author

📚 Stacked PR series (8 total) — split from #3307

Merge bottom-up; each builds on the one above it:

  1. feat(computer): main-thread synthetic-input executor + CEF crash fix (1/8 of #3307) #3340 — main-thread synthetic-input executor + CEF crash fix
  2. feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307) #3341 — AX/UIA perception + automate engine
  3. feat(agent): wire automate/ax_interact computer tools (3/8 of #3307) #3342 — wire automate/ax_interact computer tools
  4. feat(voice): Phase 2 always-on listening engine + RPC (4/8 of #3307) #3343 — Phase 2 always-on listening engine + RPC
  5. feat(voice): always-on Settings toggle + debug panel + i18n (5/8 of #3307) #3344 — always-on Settings toggle + debug panel + i18n
  6. feat(notch): always-visible macOS notch status pill (6/8 of #3307) #3345 — always-visible macOS notch status pill
  7. feat(voice): Phase 3 fast command router (7/8 of #3307) #3346 — Phase 3 fast command router
  8. feat(accessibility): vision-click fallback for Electron/partial-AX apps (8/8 of #3307) #3362 — vision-click fallback for Electron/partial-AX apps (Phase 1.5 complete)

Tracker: docs/voice-system-actions.md.

M3gA-Mind added a commit to M3gA-Mind/openhuman that referenced this pull request Jun 4, 2026
…ansai#3340 review)

Per @oxoxDev: MouseTool/KeyboardTool inherited external_effect=false, so
neither hit the ApprovalGate — PermissionLevel::Dangerous alone does NOT
trigger it (the gate keys off external_effect_with_args). With
computer_control.enabled, blind clicks / arbitrary keystrokes could run
unattended on an auto-approved turn, with no sensitive-app denylist.

- Override external_effect → true on both tools (gate every action).
- Wrap the main-thread input executor in catch_unwind so an enigo FFI
  panic can't unwind across the app main thread.
- Correct the user_filter.rs / ax_interact.rs comments that wrongly
  claimed Dangerous fires the gate.
- Tests: assert both tools route through the gate.
M3gA-Mind added a commit to M3gA-Mind/openhuman that referenced this pull request Jun 4, 2026
…ps (Phase 1.5)

Adds a model-chosen `vision_click { description }` action to the `automate`
loop for apps that expose no usable accessibility tree (Slack, Discord,
VS Code). Flow: screenshot the app window -> ask the main vision model for the
target's pixel coordinates (via the existing `[IMAGE:]` marker path) -> map
image pixels to absolute screen points -> guarded left-click.

- New `accessibility/vision_click.rs`: pure `image_to_screen` coordinate
  transform (folds in the deferred F2 mapping -- the px->pt ratio absorbs the
  capture downscale + Retina backing scale), tolerant locate-response parser,
  capture geometry, and the main-thread guarded click (`run_input_on_main`,
  Change 1.15).
- Section 1.8 safety guard: only clicks when the target app is frontmost;
  refuses on positive evidence another app is focused, so synthetic input never
  lands on OpenHuman's own CEF window.
- Reuses the main `chat` vision provider -- no new inference API, no new tool,
  no new approval surface (inherits `automate`'s Dangerous + mutations gate).
- 19 new unit tests (pure transform/parse + scripted-backend loop integration,
  incl. the frontmost-refusal guard). All 25 automate + vision_click tests green.

Closes the last open Phase 1.5 item (tinyhumansai#3148). Stacks on tinyhumansai#3340-tinyhumansai#3346.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants