feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307) by M3gA-Mind · Pull Request #3341 · tinyhumansai/openhuman

M3gA-Mind · 2026-06-04T09:14:09Z

Summary

Slice 2/8 of #3307 — the accessibility perception + automate engine.

Rust-internal perceive → act → settle → verify loop (accessibility/automate.rs) with poll-until-stable settle and playback verification.
ax_interact gains the AXEnabled diagnostics field + counts_settled/ax_wait_settled settle primitives; Music fast-path; Windows UIA superset.
Exposes launch_platform as pub(crate) so the automate loop can launch apps mid-flow.

Builds on the merged Phase 1 (#3168) — these files evolve the already-merged ax_interact/launch_app.

Files (13)

accessibility/{automate,automate_tests,ax_interact,ax_interact_tests,helper,mod,uia_interact}.rs, accessibility/app_fastpaths/{mod,music,fastpaths_tests}.rs, tools/impl/system/{launch_app,mod}.rs, docs/voice-automate-plan.md.

Part of the #3307 split. PR 3307 (72 files) is being replaced by 7 small, dependency-ordered PRs (merge-train). Branches are stacked; each PR's true slice is shown once its predecessors merge and it is rebased onto main.
Stacked on slice 1 (#3340).

Summary by CodeRabbit

New Features
- Rust-driven UI automation loop with deterministic fast-paths and a Music “play ” fast-path to launch, search, and start playback.
- New UI-settling helper to reduce timing races and improved element enabled reporting for accessibility captures.
Documentation
- Added a detailed plan covering automation flow, events, testing strategy, and milestones.
Tests
- Expanded unit and integration tests for fast-paths, automate loop behaviors, settling logic and AX probes.

Run enigo keyboard/mouse on the app main thread via a native-registry executor; enigo's macOS TSMGetInputSourceProperty traps off-thread and crashes the CEF host. Adds mouse/keyboard tools, the main_thread bridge, and downscaled screenshots so the model can see them. Slice 1/7 of tinyhumansai#3307 (was the 'computer control' area).

… loop Adds the Rust-internal automate engine (poll-until-stable settle, playback verification), the AXEnabled diagnostics field + settle primitives on ax_interact, the Music fast-path, and the Windows UIA superset. Exposes launch_platform as pub(crate) so the automate loop can launch apps mid-flow. Slice 2/7 of tinyhumansai#3307 (accessibility/automate engine).

coderabbitai · 2026-06-04T09:14:17Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b2c0000-7d57-4d2a-b1a1-bb761dc5d5ed

📥 Commits

Reviewing files that changed from the base of the PR and between 2be9825 and 4f24626.

📒 Files selected for processing (1)

src/openhuman/accessibility/automate.rs

📝 Walkthrough

Walkthrough

This PR adds a Rust-driven perceive→decide→act→settle→verify automation loop (automate::run) with an injectable AutomateBackend, AX settling primitives, a Music app fast-path (play ), comprehensive unit/integration test coverage, and wiring + docs for Phase 1.5.

Changes

Automate Rust UI Automation System

Layer / File(s)	Summary
AX Element Model & Settle Infrastructure `src/openhuman/accessibility/ax_interact.rs`, `src/openhuman/accessibility/ax_interact_tests.rs`, `src/openhuman/accessibility/helper.rs`, `src/openhuman/accessibility/uia_interact.rs`	`AXElement` adds optional `enabled` with Serde default and `Default` derive; `AXElement::new()` added. `counts_settled()` decides tail stability. `ax_wait_settled(app_name, stable_ms, timeout_ms)` polls element counts until stable or timeout. Swift helper returns `enabled` via `axEnabled(_:)`. Manual AX probe test added; Windows list() notes UIA `IsEnabled` TODO.
Core Automate Loop & Backend Trait `src/openhuman/accessibility/automate.rs`	Adds `AutomateBackend` trait (perceive, decide, act_launch/press/set_value, open_url, verify_playing, settle, wait), constants (`DEFAULT_STEP_BUDGET`, `MAX_SNAPSHOT`), data contracts (`Action`, `AutomateOutcome`, `AutomateOptions`), `progress()`, `render_snapshot()`, `parse_action()` (first JSON block + one repair retry), and `run()` async loop with fast-path dispatch, no-progress guard, and `RealBackend` wiring (AX calls, memory-model routing, platform open_url, macOS verify_playing, blocking settle).
Automate Loop Test Coverage `src/openhuman/accessibility/automate_tests.rs`	`ScriptedBackend` test double with queued model outputs and fixed AX elements; tests for happy paths (launch→list→press→done, navigation sequences, set_value), failure guards (budget exhaustion, stuck repeats), parse-repair behavior, explicit `fail` propagation, press-error recording, plus `parse_action` and `render_snapshot` unit tests.
Music Fast-Path Implementation `src/openhuman/accessibility/app_fastpaths/music.rs`	`matches()` and `extract_play_query()` (quoted titles, trailing `by`, fallback after last "play", filler/pronoun rejection). Utilities: `first_token()`, `percent_encode()`, `search_url()`, `pick_row()` (exact label preferred). `run()` launches Music, opens search URL, perceives rows, selects row, navigates, waits for Play control via baseline counts, presses Play with retry, and verifies playback via `verify_playing()`. Unit tests and ignored macOS live/manual probe tests included.
Fast-Path Test Coverage `src/openhuman/accessibility/app_fastpaths/fastpaths_tests.rs`, `src/openhuman/accessibility/app_fastpaths/mod.rs`	Adds `try_fastpath(app, goal, backend)` dispatcher to registered fast-paths (Music). Tests for `matches()`/`extract_play_query()` including non-play boundary cases and unicode regression; scripted Backend tests exercise full fast-path sequence, disabled-row regression, dispatch selectivity, and `AutomateOutcome` shape sanity.
System Integration & Wiring `src/openhuman/accessibility/mod.rs`, `src/openhuman/tools/impl/system/launch_app.rs`, `src/openhuman/tools/impl/system/mod.rs`	Exports `pub mod app_fastpaths;` and `pub mod automate;`. `launch_platform` made `pub(crate)` and re-exported for internal reuse by automate loop.
Implementation Plan Documentation `docs/voice-automate-plan.md`	New Phase 1.5 plan describing orchestrator→AutomateTool→automate::run architecture, perceive→decide→act→settle→verify loop, strict fast-model JSON action schema and repair retry, Music fast-path proof steps, per-step progress streaming, testing strategy, milestones (M1–M6), and open risks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

agent

Suggested reviewers

senamakel
graycyrus

Poem

🐰 I hopped through AX trees and counted tails that rest,
I listened for "Play" and matched the song request,
A loop that peeks, decides, then carefully taps,
Settles the screen, retries the clumsy gaps,
A rabbit’s tiny script that helps the music press.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(accessibility): AX/UIA perception + automate engine (2/8 of `#3307`)' accurately and specifically describes the main change: implementing the accessibility perception and automate engine as part of a larger multi-PR feature.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…accessibility # Conflicts: # src/openhuman/tools/impl/browser/screenshot.rs # src/openhuman/tools/impl/computer/main_thread.rs

M3gA-Mind · 2026-06-04T11:34:36Z

📚 Stacked PR series (8 total) — split from #3307

Merge bottom-up; each builds on the one above it:

feat(computer): main-thread synthetic-input executor + CEF crash fix (1/8 of #3307) #3340 — main-thread synthetic-input executor + CEF crash fix
feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307) #3341 — AX/UIA perception + automate engine
feat(agent): wire automate/ax_interact computer tools (3/8 of #3307) #3342 — wire automate/ax_interact computer tools
feat(voice): Phase 2 always-on listening engine + RPC (4/8 of #3307) #3343 — Phase 2 always-on listening engine + RPC
feat(voice): always-on Settings toggle + debug panel + i18n (5/8 of #3307) #3344 — always-on Settings toggle + debug panel + i18n
feat(notch): always-visible macOS notch status pill (6/8 of #3307) #3345 — always-visible macOS notch status pill
feat(voice): Phase 3 fast command router (7/8 of #3307) #3346 — Phase 3 fast command router
feat(accessibility): vision-click fallback for Electron/partial-AX apps (8/8 of #3307) #3362 — vision-click fallback for Electron/partial-AX apps (Phase 1.5 complete)

Tracker: docs/voice-system-actions.md.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

src/openhuman/accessibility/automate.rs (1)
313-318: 💤 Low value

Consider downgrading per-step logging to debug level.

Line 313 logs every iteration's action, app, label, and filter at info! level. Since the loop can run up to 12 steps by default, this produces substantial log volume. As per coding guidelines, step-by-step trace logging for development diagnostics should use debug! or trace! levels.
♻️ Suggested adjustment
-        log::info!(
+        log::debug!(
             "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={:?} filter={:?}",
             action.action,
             action.label,
             action.filter
         );
Based on learnings: "Use log / tracing at debug / trace levels for diagnostic logging in Rust" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/accessibility/automate.rs` around lines 313 - 318, The per-step
logging currently using log::info! with the LOG_PREFIX and fields action.action,
action.label, and action.filter should be downgraded to debug-level diagnostics;
replace the log::info! invocation that prints "{LOG_PREFIX} step={step}
action={:?} app={target_app:?} label={:?} filter={:?}" (and passes
action.action, action.label, action.filter) with log::debug! (or
tracing::debug!) so step-by-step iterations are only emitted in debug mode while
keeping the same message format and fields for grep-friendly tracing.
src/openhuman/accessibility/app_fastpaths/mod.rs (1)
21-29: ⚡ Quick win

Add branch-level debug tracing in fast-path dispatch.

try_fastpath currently has no debug/trace logs for match vs fallthrough, which makes dispatch diagnosis harder in automation runs.
♻️ Proposed change
 pub async fn try_fastpath(
     app: &str,
     goal: &str,
     backend: &dyn AutomateBackend,
 ) -> Option<AutomateOutcome> {
+    log::debug!("[automate::fastpath] dispatch app={app:?}");
     if music::matches(app, goal) {
+        log::debug!("[automate::fastpath] matched=music app={app:?}");
         return Some(music::run(goal, backend).await);
     }
+    log::debug!("[automate::fastpath] no-match app={app:?}");
     None
 }
As per coding guidelines, "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes" and "use log / tracing at debug / trace levels for diagnostic logging in Rust".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/accessibility/app_fastpaths/mod.rs` around lines 21 - 29, Add
branch-level diagnostic logging to try_fastpath: emit a trace/debug entry when
the function is entered (including app and goal), log a branch decision when
music::matches(app, goal) is true or false with a stable grep-friendly prefix
(e.g., "fastpath:match=true"/"fastpath:match=false"), and log before/after the
external call to music::run(goal, backend) (including success/returned outcome).
Update try_fastpath to use the tracing/log macros (trace! or debug!)
consistently and include the function name and relevant variables to aid
filtering.
src/openhuman/accessibility/app_fastpaths/music.rs (1)
327-327: ⚡ Quick win

Use backend wait abstraction for retry delay consistency.

This direct tokio::time::sleep bypasses backend-injected timing behavior; using backend.wait(...) keeps timing deterministic in tests and alternate backends.
♻️ Proposed fix
-                tokio::time::sleep(std::time::Duration::from_millis(700)).await;
+                backend.wait(700).await;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/accessibility/app_fastpaths/music.rs` at line 327, Replace the
direct tokio::time::sleep call with the backend wait abstraction to preserve
backend-injected timing behavior: find the
tokio::time::sleep(std::time::Duration::from_millis(700)).await usage and change
it to call backend.wait(std::time::Duration::from_millis(700)).await (preserving
the same duration and await semantics) so tests and alternate backends see
deterministic timing.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/voice-automate-plan.md`:
- Line 20: Add missing language identifiers to the fenced code blocks in the
docs/voice-automate-plan.md file: locate the fenced block containing the diagram
that begins with "orchestrator (chat LLM)" (the AutomateTool/automate{ app, goal
} diagram) and the other fenced block noted in the review, and change the
opening backticks from ``` to ```text so both code fences include a language
identifier (e.g., ```text).

In `@src/openhuman/accessibility/app_fastpaths/music.rs`:
- Around line 56-69: The current logic finds the last "play" via lower.rfind and
only verifies the left boundary (before_ok), which lets tokens like "playback"
slip through; update the parsing in this block (variables lower, idx, before_ok,
after, q) to also verify the right boundary by checking the character
immediately after the matched "play" (i.e., the first char of after) is either
absent or not alphabetic, and only proceed to trim/assign q when both left and
right boundaries confirm "play" is a full word.
- Around line 150-163: The panic comes from slicing the Unicode-folded string
`hl = haystack.to_lowercase()` with byte offsets `i` derived from the original
`haystack`; switch to ASCII-only lowercasing to keep byte lengths stable: change
`hl` and `nl` to use `to_ascii_lowercase()` (for the ASCII needle `" by "` this
preserves byte counts), continue advancing `i` by `needle.len()` and by
`ch.len_utf8()` as before, and perform the `hl[i..].starts_with(&nl)` check
against the ASCII-lowercased `hl`/`nl` within the existing `replace_ci` function
so slicing no longer panics.

---

Nitpick comments:
In `@src/openhuman/accessibility/app_fastpaths/mod.rs`:
- Around line 21-29: Add branch-level diagnostic logging to try_fastpath: emit a
trace/debug entry when the function is entered (including app and goal), log a
branch decision when music::matches(app, goal) is true or false with a stable
grep-friendly prefix (e.g., "fastpath:match=true"/"fastpath:match=false"), and
log before/after the external call to music::run(goal, backend) (including
success/returned outcome). Update try_fastpath to use the tracing/log macros
(trace! or debug!) consistently and include the function name and relevant
variables to aid filtering.

In `@src/openhuman/accessibility/app_fastpaths/music.rs`:
- Line 327: Replace the direct tokio::time::sleep call with the backend wait
abstraction to preserve backend-injected timing behavior: find the
tokio::time::sleep(std::time::Duration::from_millis(700)).await usage and change
it to call backend.wait(std::time::Duration::from_millis(700)).await (preserving
the same duration and await semantics) so tests and alternate backends see
deterministic timing.

In `@src/openhuman/accessibility/automate.rs`:
- Around line 313-318: The per-step logging currently using log::info! with the
LOG_PREFIX and fields action.action, action.label, and action.filter should be
downgraded to debug-level diagnostics; replace the log::info! invocation that
prints "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={:?}
filter={:?}" (and passes action.action, action.label, action.filter) with
log::debug! (or tracing::debug!) so step-by-step iterations are only emitted in
debug mode while keeping the same message format and fields for grep-friendly
tracing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 29c1b63f-deed-491e-8d93-2a45a1b0b493

📥 Commits

Reviewing files that changed from the base of the PR and between e3ebaca and 6fda92d.

📒 Files selected for processing (13)

docs/voice-automate-plan.md
src/openhuman/accessibility/app_fastpaths/fastpaths_tests.rs
src/openhuman/accessibility/app_fastpaths/mod.rs
src/openhuman/accessibility/app_fastpaths/music.rs
src/openhuman/accessibility/automate.rs
src/openhuman/accessibility/automate_tests.rs
src/openhuman/accessibility/ax_interact.rs
src/openhuman/accessibility/ax_interact_tests.rs
src/openhuman/accessibility/helper.rs
src/openhuman/accessibility/mod.rs
src/openhuman/accessibility/uia_interact.rs
src/openhuman/tools/impl/system/launch_app.rs
src/openhuman/tools/impl/system/mod.rs

…e replace - Music fast-path now requires 'play' as a whole word on BOTH sides so 'playback …' no longer parses as a play intent. - replace_ci no longer indexes the lowercased copy with original byte offsets (to_lowercase can change byte lengths → mid-codepoint slice panic on Unicode); compares on the original slice instead. - Tests for both; tag the architecture doc's fenced diagram + drop a stray trailing code fence (MD040).

M3gA-Mind · 2026-06-04T12:05:38Z

Independent review (beyond the CodeRabbit pass)

Reviewed the full accessibility slice — the automate perceive→decide→act→settle→verify loop (automate.rs), the ax_interact press/settle primitives, the Music fast-path, and the Windows UIA backend.

Findings & resolutions

✅ (fixed, 2be9825) Music fast-path matched play as a prefix → playback … mis-triggered it. Now requires a whole-word match on both sides. +test.
✅ (fixed, 2be9825) replace_ci indexed the lowercased string with the original's byte offsets → mid-codepoint slice panic on Unicode. Now compares on the original slice with an is_char_boundary guard. +Unicode test.
✅ (fixed) Cosmetic clippy::useless_format for a static step message in automate.rs.
✅ (fixed) Architecture-doc fenced-code language + stray orphan fence (MD040).

Reviewed clean

parse_action never acts on unparseable output (one repair retry, then fail) — the §1.13 hallucination guard holds.
No-progress guard (3× identical action) + step budget bound latency/cost.
verify_playing / open_url / settle are macOS/Windows-gated with clean non-supported fallbacks; ax_wait_settled + counts_settled are pure and unit-tested.
launch_platform is correctly pub(crate) for the loop's mid-flow launch; foreground-first ordering prevents synthetic input landing on the wrong window.

No further correctness issues. LGTM once CI is green.

M3gA-Mind added 2 commits June 4, 2026 14:09

This was referenced Jun 4, 2026

feat(agent): wire automate/ax_interact computer tools (3/8 of #3307) #3342

Merged

feat(voice): autonomous app control — automate loop, always-on (Hey Tiny), notch status, computer control (#3148) #3307

Closed

Merge remote-tracking branch 'upstream/main' into feat/voice-split-2-…

6fda92d

…accessibility # Conflicts: # src/openhuman/tools/impl/browser/screenshot.rs # src/openhuman/tools/impl/computer/main_thread.rs

M3gA-Mind marked this pull request as ready for review June 4, 2026 11:24

M3gA-Mind requested a review from a team June 4, 2026 11:24

coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jun 4, 2026

M3gA-Mind changed the title ~~feat(accessibility): AX/UIA perception + automate engine (2/7 of #3307)~~ feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307) Jun 4, 2026

coderabbitai Bot requested changes Jun 4, 2026

View reviewed changes

Comment thread docs/voice-automate-plan.md Outdated

Comment thread src/openhuman/accessibility/app_fastpaths/music.rs

Comment thread src/openhuman/accessibility/app_fastpaths/music.rs

coderabbitai Bot added the agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. label Jun 4, 2026

coderabbitai Bot previously approved these changes Jun 4, 2026

View reviewed changes

style(automate): drop useless format! for a static step message

4f24626

M3gA-Mind dismissed coderabbitai[bot]’s stale review via 4f24626 June 4, 2026 12:05

coderabbitai Bot approved these changes Jun 4, 2026

View reviewed changes

M3gA-Mind merged commit 7c08704 into tinyhumansai:main Jun 4, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307)#3341

feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307)#3341
M3gA-Mind merged 5 commits into
tinyhumansai:mainfrom
M3gA-Mind:feat/voice-split-2-accessibility

M3gA-Mind commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

M3gA-Mind commented Jun 4, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

M3gA-Mind commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

M3gA-Mind commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files (13)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

M3gA-Mind commented Jun 4, 2026

📚 Stacked PR series (8 total) — split from #3307

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

M3gA-Mind commented Jun 4, 2026

Independent review (beyond the CodeRabbit pass)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

M3gA-Mind commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading