Skip to content

feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307)#3341

Merged
M3gA-Mind merged 5 commits into
tinyhumansai:mainfrom
M3gA-Mind:feat/voice-split-2-accessibility
Jun 4, 2026
Merged

feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307)#3341
M3gA-Mind merged 5 commits into
tinyhumansai:mainfrom
M3gA-Mind:feat/voice-split-2-accessibility

Conversation

@M3gA-Mind
Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind commented Jun 4, 2026

Summary

Slice 2/8 of #3307 — the accessibility perception + automate engine.

  • Rust-internal perceive → act → settle → verify loop (accessibility/automate.rs) with poll-until-stable settle and playback verification.
  • ax_interact gains the AXEnabled diagnostics field + counts_settled/ax_wait_settled settle primitives; Music fast-path; Windows UIA superset.
  • Exposes launch_platform as pub(crate) so the automate loop can launch apps mid-flow.

Builds on the merged Phase 1 (#3168) — these files evolve the already-merged ax_interact/launch_app.

Files (13)

accessibility/{automate,automate_tests,ax_interact,ax_interact_tests,helper,mod,uia_interact}.rs, accessibility/app_fastpaths/{mod,music,fastpaths_tests}.rs, tools/impl/system/{launch_app,mod}.rs, docs/voice-automate-plan.md.

Part of the #3307 split. PR 3307 (72 files) is being replaced by 7 small, dependency-ordered PRs (merge-train). Branches are stacked; each PR's true slice is shown once its predecessors merge and it is rebased onto main.
Stacked on slice 1 (#3340).

Summary by CodeRabbit

  • New Features
    • Rust-driven UI automation loop with deterministic fast-paths and a Music “play ” fast-path to launch, search, and start playback.
    • New UI-settling helper to reduce timing races and improved element enabled reporting for accessibility captures.
  • Documentation
    • Added a detailed plan covering automation flow, events, testing strategy, and milestones.
  • Tests
    • Expanded unit and integration tests for fast-paths, automate loop behaviors, settling logic and AX probes.

M3gA-Mind added 2 commits June 4, 2026 14:09
Run enigo keyboard/mouse on the app main thread via a native-registry
executor; enigo's macOS TSMGetInputSourceProperty traps off-thread and
crashes the CEF host. Adds mouse/keyboard tools, the main_thread bridge,
and downscaled screenshots so the model can see them.

Slice 1/7 of tinyhumansai#3307 (was the 'computer control' area).
… loop

Adds the Rust-internal automate engine (poll-until-stable settle, playback
verification), the AXEnabled diagnostics field + settle primitives on
ax_interact, the Music fast-path, and the Windows UIA superset. Exposes
launch_platform as pub(crate) so the automate loop can launch apps mid-flow.

Slice 2/7 of tinyhumansai#3307 (accessibility/automate engine).
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b2c0000-7d57-4d2a-b1a1-bb761dc5d5ed

📥 Commits

Reviewing files that changed from the base of the PR and between 2be9825 and 4f24626.

📒 Files selected for processing (1)
  • src/openhuman/accessibility/automate.rs

📝 Walkthrough

Walkthrough

This PR adds a Rust-driven perceive→decide→act→settle→verify automation loop (automate::run) with an injectable AutomateBackend, AX settling primitives, a Music app fast-path (play ), comprehensive unit/integration test coverage, and wiring + docs for Phase 1.5.

Changes

Automate Rust UI Automation System

Layer / File(s) Summary
AX Element Model & Settle Infrastructure
src/openhuman/accessibility/ax_interact.rs, src/openhuman/accessibility/ax_interact_tests.rs, src/openhuman/accessibility/helper.rs, src/openhuman/accessibility/uia_interact.rs
AXElement adds optional enabled with Serde default and Default derive; AXElement::new() added. counts_settled() decides tail stability. ax_wait_settled(app_name, stable_ms, timeout_ms) polls element counts until stable or timeout. Swift helper returns enabled via axEnabled(_:). Manual AX probe test added; Windows list() notes UIA IsEnabled TODO.
Core Automate Loop & Backend Trait
src/openhuman/accessibility/automate.rs
Adds AutomateBackend trait (perceive, decide, act_launch/press/set_value, open_url, verify_playing, settle, wait), constants (DEFAULT_STEP_BUDGET, MAX_SNAPSHOT), data contracts (Action, AutomateOutcome, AutomateOptions), progress(), render_snapshot(), parse_action() (first JSON block + one repair retry), and run() async loop with fast-path dispatch, no-progress guard, and RealBackend wiring (AX calls, memory-model routing, platform open_url, macOS verify_playing, blocking settle).
Automate Loop Test Coverage
src/openhuman/accessibility/automate_tests.rs
ScriptedBackend test double with queued model outputs and fixed AX elements; tests for happy paths (launch→list→press→done, navigation sequences, set_value), failure guards (budget exhaustion, stuck repeats), parse-repair behavior, explicit fail propagation, press-error recording, plus parse_action and render_snapshot unit tests.
Music Fast-Path Implementation
src/openhuman/accessibility/app_fastpaths/music.rs
matches() and extract_play_query() (quoted titles, trailing by, fallback after last "play", filler/pronoun rejection). Utilities: first_token(), percent_encode(), search_url(), pick_row() (exact label preferred). run() launches Music, opens search URL, perceives rows, selects row, navigates, waits for Play control via baseline counts, presses Play with retry, and verifies playback via verify_playing(). Unit tests and ignored macOS live/manual probe tests included.
Fast-Path Test Coverage
src/openhuman/accessibility/app_fastpaths/fastpaths_tests.rs, src/openhuman/accessibility/app_fastpaths/mod.rs
Adds try_fastpath(app, goal, backend) dispatcher to registered fast-paths (Music). Tests for matches()/extract_play_query() including non-play boundary cases and unicode regression; scripted Backend tests exercise full fast-path sequence, disabled-row regression, dispatch selectivity, and AutomateOutcome shape sanity.
System Integration & Wiring
src/openhuman/accessibility/mod.rs, src/openhuman/tools/impl/system/launch_app.rs, src/openhuman/tools/impl/system/mod.rs
Exports pub mod app_fastpaths; and pub mod automate;. launch_platform made pub(crate) and re-exported for internal reuse by automate loop.
Implementation Plan Documentation
docs/voice-automate-plan.md
New Phase 1.5 plan describing orchestrator→AutomateTool→automate::run architecture, perceive→decide→act→settle→verify loop, strict fast-model JSON action schema and repair retry, Music fast-path proof steps, per-step progress streaming, testing strategy, milestones (M1–M6), and open risks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

agent

Suggested reviewers

  • senamakel
  • graycyrus

Poem

🐰 I hopped through AX trees and counted tails that rest,
I listened for "Play" and matched the song request,
A loop that peeks, decides, then carefully taps,
Settles the screen, retries the clumsy gaps,
A rabbit’s tiny script that helps the music press.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307)' accurately and specifically describes the main change: implementing the accessibility perception and automate engine as part of a larger multi-PR feature.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

…accessibility

# Conflicts:
#	src/openhuman/tools/impl/browser/screenshot.rs
#	src/openhuman/tools/impl/computer/main_thread.rs
@M3gA-Mind M3gA-Mind marked this pull request as ready for review June 4, 2026 11:24
@M3gA-Mind M3gA-Mind requested a review from a team June 4, 2026 11:24
@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jun 4, 2026
@M3gA-Mind M3gA-Mind changed the title feat(accessibility): AX/UIA perception + automate engine (2/7 of #3307) feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307) Jun 4, 2026
@M3gA-Mind
Copy link
Copy Markdown
Contributor Author

📚 Stacked PR series (8 total) — split from #3307

Merge bottom-up; each builds on the one above it:

  1. feat(computer): main-thread synthetic-input executor + CEF crash fix (1/8 of #3307) #3340 — main-thread synthetic-input executor + CEF crash fix
  2. feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307) #3341 — AX/UIA perception + automate engine
  3. feat(agent): wire automate/ax_interact computer tools (3/8 of #3307) #3342 — wire automate/ax_interact computer tools
  4. feat(voice): Phase 2 always-on listening engine + RPC (4/8 of #3307) #3343 — Phase 2 always-on listening engine + RPC
  5. feat(voice): always-on Settings toggle + debug panel + i18n (5/8 of #3307) #3344 — always-on Settings toggle + debug panel + i18n
  6. feat(notch): always-visible macOS notch status pill (6/8 of #3307) #3345 — always-visible macOS notch status pill
  7. feat(voice): Phase 3 fast command router (7/8 of #3307) #3346 — Phase 3 fast command router
  8. feat(accessibility): vision-click fallback for Electron/partial-AX apps (8/8 of #3307) #3362 — vision-click fallback for Electron/partial-AX apps (Phase 1.5 complete)

Tracker: docs/voice-system-actions.md.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
src/openhuman/accessibility/automate.rs (1)

313-318: 💤 Low value

Consider downgrading per-step logging to debug level.

Line 313 logs every iteration's action, app, label, and filter at info! level. Since the loop can run up to 12 steps by default, this produces substantial log volume. As per coding guidelines, step-by-step trace logging for development diagnostics should use debug! or trace! levels.

♻️ Suggested adjustment
-        log::info!(
+        log::debug!(
             "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={:?} filter={:?}",
             action.action,
             action.label,
             action.filter
         );

Based on learnings: "Use log / tracing at debug / trace levels for diagnostic logging in Rust" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/accessibility/automate.rs` around lines 313 - 318, The per-step
logging currently using log::info! with the LOG_PREFIX and fields action.action,
action.label, and action.filter should be downgraded to debug-level diagnostics;
replace the log::info! invocation that prints "{LOG_PREFIX} step={step}
action={:?} app={target_app:?} label={:?} filter={:?}" (and passes
action.action, action.label, action.filter) with log::debug! (or
tracing::debug!) so step-by-step iterations are only emitted in debug mode while
keeping the same message format and fields for grep-friendly tracing.
src/openhuman/accessibility/app_fastpaths/mod.rs (1)

21-29: ⚡ Quick win

Add branch-level debug tracing in fast-path dispatch.

try_fastpath currently has no debug/trace logs for match vs fallthrough, which makes dispatch diagnosis harder in automation runs.

♻️ Proposed change
 pub async fn try_fastpath(
     app: &str,
     goal: &str,
     backend: &dyn AutomateBackend,
 ) -> Option<AutomateOutcome> {
+    log::debug!("[automate::fastpath] dispatch app={app:?}");
     if music::matches(app, goal) {
+        log::debug!("[automate::fastpath] matched=music app={app:?}");
         return Some(music::run(goal, backend).await);
     }
+    log::debug!("[automate::fastpath] no-match app={app:?}");
     None
 }

As per coding guidelines, "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes" and "use log / tracing at debug / trace levels for diagnostic logging in Rust".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/accessibility/app_fastpaths/mod.rs` around lines 21 - 29, Add
branch-level diagnostic logging to try_fastpath: emit a trace/debug entry when
the function is entered (including app and goal), log a branch decision when
music::matches(app, goal) is true or false with a stable grep-friendly prefix
(e.g., "fastpath:match=true"/"fastpath:match=false"), and log before/after the
external call to music::run(goal, backend) (including success/returned outcome).
Update try_fastpath to use the tracing/log macros (trace! or debug!)
consistently and include the function name and relevant variables to aid
filtering.
src/openhuman/accessibility/app_fastpaths/music.rs (1)

327-327: ⚡ Quick win

Use backend wait abstraction for retry delay consistency.

This direct tokio::time::sleep bypasses backend-injected timing behavior; using backend.wait(...) keeps timing deterministic in tests and alternate backends.

♻️ Proposed fix
-                tokio::time::sleep(std::time::Duration::from_millis(700)).await;
+                backend.wait(700).await;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/accessibility/app_fastpaths/music.rs` at line 327, Replace the
direct tokio::time::sleep call with the backend wait abstraction to preserve
backend-injected timing behavior: find the
tokio::time::sleep(std::time::Duration::from_millis(700)).await usage and change
it to call backend.wait(std::time::Duration::from_millis(700)).await (preserving
the same duration and await semantics) so tests and alternate backends see
deterministic timing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/voice-automate-plan.md`:
- Line 20: Add missing language identifiers to the fenced code blocks in the
docs/voice-automate-plan.md file: locate the fenced block containing the diagram
that begins with "orchestrator (chat LLM)" (the AutomateTool/automate{ app, goal
} diagram) and the other fenced block noted in the review, and change the
opening backticks from ``` to ```text so both code fences include a language
identifier (e.g., ```text).

In `@src/openhuman/accessibility/app_fastpaths/music.rs`:
- Around line 56-69: The current logic finds the last "play" via lower.rfind and
only verifies the left boundary (before_ok), which lets tokens like "playback"
slip through; update the parsing in this block (variables lower, idx, before_ok,
after, q) to also verify the right boundary by checking the character
immediately after the matched "play" (i.e., the first char of after) is either
absent or not alphabetic, and only proceed to trim/assign q when both left and
right boundaries confirm "play" is a full word.
- Around line 150-163: The panic comes from slicing the Unicode-folded string
`hl = haystack.to_lowercase()` with byte offsets `i` derived from the original
`haystack`; switch to ASCII-only lowercasing to keep byte lengths stable: change
`hl` and `nl` to use `to_ascii_lowercase()` (for the ASCII needle `" by "` this
preserves byte counts), continue advancing `i` by `needle.len()` and by
`ch.len_utf8()` as before, and perform the `hl[i..].starts_with(&nl)` check
against the ASCII-lowercased `hl`/`nl` within the existing `replace_ci` function
so slicing no longer panics.

---

Nitpick comments:
In `@src/openhuman/accessibility/app_fastpaths/mod.rs`:
- Around line 21-29: Add branch-level diagnostic logging to try_fastpath: emit a
trace/debug entry when the function is entered (including app and goal), log a
branch decision when music::matches(app, goal) is true or false with a stable
grep-friendly prefix (e.g., "fastpath:match=true"/"fastpath:match=false"), and
log before/after the external call to music::run(goal, backend) (including
success/returned outcome). Update try_fastpath to use the tracing/log macros
(trace! or debug!) consistently and include the function name and relevant
variables to aid filtering.

In `@src/openhuman/accessibility/app_fastpaths/music.rs`:
- Line 327: Replace the direct tokio::time::sleep call with the backend wait
abstraction to preserve backend-injected timing behavior: find the
tokio::time::sleep(std::time::Duration::from_millis(700)).await usage and change
it to call backend.wait(std::time::Duration::from_millis(700)).await (preserving
the same duration and await semantics) so tests and alternate backends see
deterministic timing.

In `@src/openhuman/accessibility/automate.rs`:
- Around line 313-318: The per-step logging currently using log::info! with the
LOG_PREFIX and fields action.action, action.label, and action.filter should be
downgraded to debug-level diagnostics; replace the log::info! invocation that
prints "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={:?}
filter={:?}" (and passes action.action, action.label, action.filter) with
log::debug! (or tracing::debug!) so step-by-step iterations are only emitted in
debug mode while keeping the same message format and fields for grep-friendly
tracing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 29c1b63f-deed-491e-8d93-2a45a1b0b493

📥 Commits

Reviewing files that changed from the base of the PR and between e3ebaca and 6fda92d.

📒 Files selected for processing (13)
  • docs/voice-automate-plan.md
  • src/openhuman/accessibility/app_fastpaths/fastpaths_tests.rs
  • src/openhuman/accessibility/app_fastpaths/mod.rs
  • src/openhuman/accessibility/app_fastpaths/music.rs
  • src/openhuman/accessibility/automate.rs
  • src/openhuman/accessibility/automate_tests.rs
  • src/openhuman/accessibility/ax_interact.rs
  • src/openhuman/accessibility/ax_interact_tests.rs
  • src/openhuman/accessibility/helper.rs
  • src/openhuman/accessibility/mod.rs
  • src/openhuman/accessibility/uia_interact.rs
  • src/openhuman/tools/impl/system/launch_app.rs
  • src/openhuman/tools/impl/system/mod.rs

Comment thread docs/voice-automate-plan.md Outdated
Comment thread src/openhuman/accessibility/app_fastpaths/music.rs
Comment thread src/openhuman/accessibility/app_fastpaths/music.rs
…e replace

- Music fast-path now requires 'play' as a whole word on BOTH sides so
  'playback …' no longer parses as a play intent.
- replace_ci no longer indexes the lowercased copy with original byte
  offsets (to_lowercase can change byte lengths → mid-codepoint slice
  panic on Unicode); compares on the original slice instead.
- Tests for both; tag the architecture doc's fenced diagram + drop a
  stray trailing code fence (MD040).
@coderabbitai coderabbitai Bot added the agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. label Jun 4, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 4, 2026
@M3gA-Mind
Copy link
Copy Markdown
Contributor Author

Independent review (beyond the CodeRabbit pass)

Reviewed the full accessibility slice — the automate perceive→decide→act→settle→verify loop (automate.rs), the ax_interact press/settle primitives, the Music fast-path, and the Windows UIA backend.

Findings & resolutions

  • (fixed, 2be9825) Music fast-path matched play as a prefix → playback … mis-triggered it. Now requires a whole-word match on both sides. +test.
  • (fixed, 2be9825) replace_ci indexed the lowercased string with the original's byte offsets → mid-codepoint slice panic on Unicode. Now compares on the original slice with an is_char_boundary guard. +Unicode test.
  • (fixed) Cosmetic clippy::useless_format for a static step message in automate.rs.
  • (fixed) Architecture-doc fenced-code language + stray orphan fence (MD040).

Reviewed clean

  • parse_action never acts on unparseable output (one repair retry, then fail) — the §1.13 hallucination guard holds.
  • No-progress guard (3× identical action) + step budget bound latency/cost.
  • verify_playing / open_url / settle are macOS/Windows-gated with clean non-supported fallbacks; ax_wait_settled + counts_settled are pure and unit-tested.
  • launch_platform is correctly pub(crate) for the loop's mid-flow launch; foreground-first ordering prevents synthetic input landing on the wrong window.

No further correctness issues. LGTM once CI is green.

@M3gA-Mind M3gA-Mind merged commit 7c08704 into tinyhumansai:main Jun 4, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant