feat(voice): Phase 2 always-on listening engine + RPC (4/8 of #3307) by M3gA-Mind · Pull Request #3343 · tinyhumansai/openhuman

M3gA-Mind · 2026-06-04T09:14:15Z

Summary

Slice 4/8 of #3307 — Phase 2 always-on listening engine + config + RPC.

Continuous cpal mic → VAD segmenter → STT → agent, no hotkey; opt-in via voice_server.always_on_enabled.
"Hey Tiny" wake word (English-forced STT + fuzzy match); screen-lock privacy pause.
Config schema + live-apply on the settings RPC + start_if_enabled wiring + a JSON-RPC roundtrip E2E.

Files (9)

voice/{always_on,audio_capture,mod}.rs, config/schema/voice_server.rs, config/{schemas,ops,ops_tests}.rs, credentials/ops.rs, tests/json_rpc_e2e.rs.

Part of the #3307 split. PR 3307 (72 files) is being replaced by 7 small, dependency-ordered PRs (merge-train). Branches are stacked; each PR's true slice is shown once its predecessors merge and it is rebased onto main.
Stacked on slice 3 (#3342).

Summary by CodeRabbit

New Features
- Enabled always-on continuous voice listening (configurable)
- Added custom wake-word support with fuzzy matching and whitespace handling
- Exposed voice-activity-detection (VAD) tuning parameters for sensitivity and duration control
- Automatic pause of voice capture when screen is locked for privacy
Tests
- Added end-to-end test validating voice server settings persistence and defaults

Run enigo keyboard/mouse on the app main thread via a native-registry executor; enigo's macOS TSMGetInputSourceProperty traps off-thread and crashes the CEF host. Adds mouse/keyboard tools, the main_thread bridge, and downscaled screenshots so the model can see them. Slice 1/7 of tinyhumansai#3307 (was the 'computer control' area).

… loop Adds the Rust-internal automate engine (poll-until-stable settle, playback verification), the AXEnabled diagnostics field + settle primitives on ax_interact, the Music fast-path, and the Windows UIA superset. Exposes launch_platform as pub(crate) so the automate loop can launch apps mid-flow. Slice 2/7 of tinyhumansai#3307 (accessibility/automate engine).

…trator Registers the AutomateTool (multi-step UI flows in one call) and the ax_interact denylist/opt-in plumbing; adds the catalog toggle, tool definition, and orchestrator prompt guidance (automate + screenshot/ mouse/keyboard fallback for Electron apps with empty AX trees). Slice 3/7 of tinyhumansai#3307 (tool wiring + prompts).

Continuous cpal mic → VAD segmenter → STT → agent with no hotkey, opt-in via voice_server.always_on_enabled, 'Hey Tiny' wake word (English-forced STT + fuzzy match), and screen-lock privacy pause. Adds the config schema, live-apply on the settings RPC, start_if_enabled wiring, and a JSON-RPC roundtrip E2E. Slice 4/7 of tinyhumansai#3307 (always-on core).

coderabbitai · 2026-06-04T09:14:24Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3593d6b0-537e-499b-9ede-1f6f98260324

📥 Commits

Reviewing files that changed from the base of the PR and between 9ba7d73 and a834a75.

📒 Files selected for processing (4)

src/openhuman/config/ops.rs
src/openhuman/config/schemas.rs
src/openhuman/credentials/ops.rs
src/openhuman/voice/always_on.rs

🚧 Files skipped from review as they are similar to previous changes (3)

src/openhuman/config/ops.rs
src/openhuman/config/schemas.rs
src/openhuman/voice/always_on.rs

📝 Walkthrough

Walkthrough

Adds an always-on listening feature: VAD-based speech segmentation, capture thread + resampling, WAV encoding and STT transcription, wake-word extraction with fuzzy matching, config schema fields (always_on_enabled, wake_word) and RPC wiring, login-time start/stop, macOS lock privacy hook, and tests.

Changes

Always-On Voice Listening Feature

Layer / File(s)	Summary
Voice server configuration schema and defaults `src/openhuman/config/schema/voice_server.rs`	`VoiceServerConfig` adds `always_on_enabled`, VAD params (`vad_onset_threshold`, `vad_hangover_ms`, `vad_min_speech_ms`, `vad_max_utterance_secs`) and `wake_word` with serde defaults and unit tests; Default impl updated.
Audio capture utilities for always-on pipeline `src/openhuman/voice/audio_capture.rs`	`TARGET_SAMPLE_RATE` made `pub(crate)`; `chunk_rms`, `to_mono`, `resample` made `pub(crate)`; new `encode_wav_16k()` added to emit 16kHz mono 16-bit PCM WAV bytes.
Always-on VAD, capture, and transcription engine `src/openhuman/voice/always_on.rs`	Implements `VadConfig`, `VadSegmenter`, `VadEvent`, capture thread (CPAL) and frame pipeline, pause on screen lock, utterance buffering, WAV→base64→STT transcription, `extract_command` wake-word fuzzy gate, macOS lock FFI, start/stop APIs, and comprehensive unit tests.
RPC/config API endpoints for always-on settings `src/openhuman/config/ops.rs`, `src/openhuman/config/schemas.rs`	`VoiceServerSettingsPatch` extended with `always_on_enabled` and `wake_word`; GET/SET handlers updated to include/persist these fields; handler triggers `voice::always_on::start_if_enabled` after apply.
Module wiring and login-time initialization `src/openhuman/voice/mod.rs`, `src/openhuman/credentials/ops.rs`	`pub mod always_on;` added; login startup calls `start_if_enabled` and logout calls `stop()` to gate runtime capture.
Unit and E2E tests `src/openhuman/config/ops_tests.rs`, `tests/json_rpc_e2e.rs`	Config tests updated to set new patch fields; new E2E test round-trips defaults and updates for `always_on_enabled` and `wake_word`.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant RPCHandler
  participant ConfigOps
  participant VoiceAlwaysOn
  Client->>RPCHandler: config_update_voice_server_settings
  RPCHandler->>ConfigOps: apply_voice_server_settings (includes always_on_enabled, wake_word)
  ConfigOps->>ConfigOps: persist trimmed wake_word, always_on flag
  ConfigOps-->>RPCHandler: apply result
  RPCHandler->>VoiceAlwaysOn: start_if_enabled(config)
  VoiceAlwaysOn-->>RPCHandler: started/stopped
  RPCHandler-->>Client: success

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

feat: always-on voice command → system action (listen, understand, execute) #3148: Implements the always-on listening, VAD, wake-word gating, and config wiring described in the proposal.

Possibly related PRs

tinyhumansai/openhuman#178: Adds to the openhuman::voice module surface and is related to the module-level wiring used here.

Suggested reviewers

graycyrus
senamakel

Poem

🐇 I listen softly, waiting for the cue,
RMS hums and VAD wakes up anew,
Trimmed wake words whisper, fuzzy but kind,
WAVs and transcripts bring the command to mind,
Ever ready—always-on—ears for you.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: implementing Phase 2 always-on listening engine with RPC integration, which is the core focus across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

M3gA-Mind · 2026-06-04T11:34:39Z

📚 Stacked PR series (8 total) — split from #3307

Merge bottom-up; each builds on the one above it:

feat(computer): main-thread synthetic-input executor + CEF crash fix (1/8 of #3307) #3340 — main-thread synthetic-input executor + CEF crash fix
feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307) #3341 — AX/UIA perception + automate engine
feat(agent): wire automate/ax_interact computer tools (3/8 of #3307) #3342 — wire automate/ax_interact computer tools
feat(voice): Phase 2 always-on listening engine + RPC (4/8 of #3307) #3343 — Phase 2 always-on listening engine + RPC
feat(voice): always-on Settings toggle + debug panel + i18n (5/8 of #3307) #3344 — always-on Settings toggle + debug panel + i18n
feat(notch): always-visible macOS notch status pill (6/8 of #3307) #3345 — always-visible macOS notch status pill
feat(voice): Phase 3 fast command router (7/8 of #3307) #3346 — Phase 3 fast command router
feat(accessibility): vision-click fallback for Electron/partial-AX apps (8/8 of #3307) #3362 — vision-click fallback for Electron/partial-AX apps (Phase 1.5 complete)

Tracker: docs/voice-system-actions.md.

Take main's versions of already-merged slice-1/2/3 files.

M3gA-Mind · 2026-06-04T13:57:17Z

Independent review (beyond the CodeRabbit pass)

Reviewed the always-on listening core — the VadSegmenter, continuous cpal capture, wake-word gating, screen-lock privacy pause, and the config/RPC wiring.

Reviewed clean

extract_command (wake word) — case/punctuation-insensitive tokenization, anchors on the longest wake token and fuzzy-matches it (levenshtein) within the first 3 tokens so STT homophones ("tiny"→"tony/tinny", "hey"→"a/ok") still trigger. The wake.iter().max_by_key(..).unwrap() at L416 is safe — the empty-wake-word case returns early at L402. Bounded match window avoids mid-sentence false triggers. Well covered (homophones / absent / empty-passes-everything).
VadSegmenter — onset/hangover/min-duration/max-utterance state machine is pure and unit-tested (short-blip drop, mid-utterance pause doesn't split, max-utterance force-flush, reset aborts without event).
Capture — cpal runs on a dedicated OS thread (not the tokio runtime); samples flow over an unbounded mpsc to an async processor. Screen-lock watcher gates delivery (macOS CGSessionCopyCurrentDictionary), so audio isn't transcribed while locked.
Config/RPC — always_on_enabled + wake_word thread through the voice-server schema and apply live via start_if_enabled; covered by the JSON-RPC E2E (json_rpc_voice_server_settings_roundtrip_always_on_and_wake_word).
Opt-in + off by default; start_if_enabled is idempotent (process-once guard).

No correctness issues found. LGTM once CI is green.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

src/openhuman/config/schema/voice_server.rs (1)

165-171: ⚡ Quick win

Assert all newly added defaults in the legacy-deserialization test.

This test currently verifies only a subset of the new Phase-2 fields. Adding assertions for vad_onset_threshold, vad_max_utterance_secs, and wake_word will better protect backward-compat deserialization.

Proposed test hardening

 #[test]
 fn deserializes_with_all_vad_fields_defaulted() {
     // An older config file with none of the Phase 2 keys must still load.
     let c: VoiceServerConfig = serde_json::from_str("{}").unwrap();
     assert!(!c.always_on_enabled);
+    assert_eq!(c.vad_onset_threshold, default_vad_onset_threshold());
     assert_eq!(c.vad_hangover_ms, default_vad_hangover_ms());
     assert_eq!(c.vad_min_speech_ms, default_vad_min_speech_ms());
+    assert_eq!(c.vad_max_utterance_secs, default_vad_max_utterance_secs());
+    assert_eq!(c.wake_word, default_wake_word());
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/config/schema/voice_server.rs` around lines 165 - 171, The
legacy-deserialization test deserializes an empty JSON but only asserts a subset
of new Phase-2 default VAD fields; update the test function
deserializes_with_all_vad_fields_defaulted() to also assert that
c.vad_onset_threshold == default_vad_onset_threshold(), c.vad_max_utterance_secs
== default_vad_max_utterance_secs(), and c.wake_word == default_wake_word() (or
the appropriate default accessor used elsewhere) so all newly added defaults are
verified on legacy deserialization.

src/openhuman/config/ops_tests.rs (1)

1046-1048: ⚡ Quick win

Assert the new always-on fields in this roundtrip test.

The patch now sets always_on_enabled and wake_word, but the test doesn’t verify those values in the returned/saved config.

✅ Suggested assertion additions

     let outcome = load_and_apply_voice_server_settings(patch)
         .await
         .expect("ok");
+    assert_eq!(
+        outcome.value["config"]["voice_server"]["always_on_enabled"],
+        serde_json::json!(true)
+    );
+    assert_eq!(
+        outcome.value["config"]["voice_server"]["wake_word"],
+        serde_json::json!("Hey Tiny")
+    );
     assert!(
         outcome.value["config"]["voice_server"]["min_duration_secs"]
             .as_f64()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/config/ops_tests.rs` around lines 1046 - 1048, The test that
performs the config roundtrip must assert the newly-set fields
`always_on_enabled` and `wake_word`; after you load or receive the roundtripped
config (the variable holding the returned/saved config in the roundtrip test),
add assertions that `always_on_enabled` equals Some(true) and `wake_word` equals
Some("Hey Tiny".to_string()) to ensure those values are preserved by the
roundtrip logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/config/ops.rs`:
- Around line 2044-2046: The code assigns update.wake_word directly to
config.voice_server.wake_word, allowing whitespace-only values to persist;
change the assignment to normalize by trimming surrounding whitespace from
update.wake_word (use update.wake_word.trim()), and if the trimmed string is
empty treat it as the documented “no wake word” (set
config.voice_server.wake_word to the empty string or None consistent with its
type); update the logic around update.wake_word and
config.voice_server.wake_word to use the trimmed value so whitespace-only inputs
do not become a non-empty wake word.

In `@src/openhuman/config/schemas.rs`:
- Around line 1731-1736: The current code calls
config_rpc::load_and_apply_voice_server_settings(patch).await? then attempts to
refresh live state with config_rpc::load_config_with_timeout().await but
silently ignores failures to reload; change this to log a grep-friendly error
when load_config_with_timeout() fails and only call
crate::openhuman::voice::always_on::start_if_enabled(&config).await on success;
specifically, replace the if let Ok(config) =
config_rpc::load_config_with_timeout().await block with explicit match handling
that logs an error (including the returned error) using a stable prefix like
"voice:reload:error" and, on Ok(config), logs a "voice:reload:ok" message before
calling always_on::start_if_enabled to ensure failures are visible and state
transitions are traceable.

In `@src/openhuman/credentials/ops.rs`:
- Around line 44-47: The always-on voice service is started in start_if_enabled
but never stopped during logout; update the logout teardown
(stop_login_gated_services) to symmetrically stop or disable the always-on
service by invoking the appropriate shutdown API on
crate::openhuman::voice::always_on (e.g., add a stop_if_enabled/stop function or
call the existing stop method) so microphone capture is explicitly halted on
logout; locate start_if_enabled in the always_on module and add the matching
stop call in stop_login_gated_services to ensure resources are released and
VAD/STT are disabled.

In `@src/openhuman/voice/always_on.rs`:
- Around line 224-230: The code returns early when RUNNING is true, skipping
refresh of runtime settings so the process keeps using the original
VadConfig/Config; fix by ensuring settings are refreshed before the early return
or by applying them to the running processor: move the
VadConfig::from_server_config(&app_config.voice_server) and let config =
app_config.clone() to occur before the RUNNING.swap(...) check (or,
alternatively, call a runtime update method on the running processor with the
new VadConfig/Config when RUNNING was already true), so wake-word/VAD changes
take effect without restart.

---

Nitpick comments:
In `@src/openhuman/config/ops_tests.rs`:
- Around line 1046-1048: The test that performs the config roundtrip must assert
the newly-set fields `always_on_enabled` and `wake_word`; after you load or
receive the roundtripped config (the variable holding the returned/saved config
in the roundtrip test), add assertions that `always_on_enabled` equals
Some(true) and `wake_word` equals Some("Hey Tiny".to_string()) to ensure those
values are preserved by the roundtrip logic.

In `@src/openhuman/config/schema/voice_server.rs`:
- Around line 165-171: The legacy-deserialization test deserializes an empty
JSON but only asserts a subset of new Phase-2 default VAD fields; update the
test function deserializes_with_all_vad_fields_defaulted() to also assert that
c.vad_onset_threshold == default_vad_onset_threshold(), c.vad_max_utterance_secs
== default_vad_max_utterance_secs(), and c.wake_word == default_wake_word() (or
the appropriate default accessor used elsewhere) so all newly added defaults are
verified on legacy deserialization.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 315fc0e5-c3ab-4ba7-a34d-f7e429a45406

📥 Commits

Reviewing files that changed from the base of the PR and between cd31484 and 9ba7d73.

📒 Files selected for processing (9)

src/openhuman/config/ops.rs
src/openhuman/config/ops_tests.rs
src/openhuman/config/schema/voice_server.rs
src/openhuman/config/schemas.rs
src/openhuman/credentials/ops.rs
src/openhuman/voice/always_on.rs
src/openhuman/voice/audio_capture.rs
src/openhuman/voice/mod.rs
tests/json_rpc_e2e.rs

…n logout CodeRabbit tinyhumansai#3343: - config/ops.rs: trim wake_word so whitespace-only collapses to the documented 'empty = no wake word' case. - config/schemas.rs: match on the live-apply config reload — warn (don't silently skip) when it fails so the saved toggle's non-application is traceable. - credentials/ops.rs: add symmetric always_on::stop() on logout so mic capture stops transcribing/delivering after logout (privacy). New always_on::stop() flips the ENABLED gate; covered by a unit test.

M3gA-Mind added 4 commits June 4, 2026 14:09

This was referenced Jun 4, 2026

feat(voice): always-on Settings toggle + debug panel + i18n (5/8 of #3307) #3344

Merged

feat(voice): autonomous app control — automate loop, always-on (Hey Tiny), notch status, computer control (#3148) #3307

Closed

M3gA-Mind changed the title ~~feat(voice): Phase 2 always-on listening engine + RPC (4/7 of #3307)~~ feat(voice): Phase 2 always-on listening engine + RPC (4/8 of #3307) Jun 4, 2026

This was referenced Jun 4, 2026

feat(computer): main-thread synthetic-input executor + CEF crash fix (1/8 of #3307) #3340

Merged

feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307) #3341

Merged

Merge upstream/main into feat/voice-split-4-alwayson-core

9ba7d73

Take main's versions of already-merged slice-1/2/3 files.

M3gA-Mind marked this pull request as ready for review June 4, 2026 13:46

M3gA-Mind requested a review from a team June 4, 2026 13:46

coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. labels Jun 4, 2026

coderabbitai Bot requested changes Jun 4, 2026

View reviewed changes

Comment thread src/openhuman/config/ops.rs

Comment thread src/openhuman/config/schemas.rs

Comment thread src/openhuman/credentials/ops.rs

Comment thread src/openhuman/voice/always_on.rs

Comment thread src/openhuman/voice/always_on.rs

coderabbitai Bot approved these changes Jun 4, 2026

View reviewed changes

senamakel merged commit f5dc9ea into tinyhumansai:main Jun 4, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): Phase 2 always-on listening engine + RPC (4/8 of #3307)#3343

feat(voice): Phase 2 always-on listening engine + RPC (4/8 of #3307)#3343
senamakel merged 6 commits into
tinyhumansai:mainfrom
M3gA-Mind:feat/voice-split-4-alwayson-core

M3gA-Mind commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

M3gA-Mind commented Jun 4, 2026

Uh oh!

M3gA-Mind commented Jun 4, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

M3gA-Mind commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files (9)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

M3gA-Mind commented Jun 4, 2026

📚 Stacked PR series (8 total) — split from #3307

Uh oh!

M3gA-Mind commented Jun 4, 2026

Independent review (beyond the CodeRabbit pass)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

M3gA-Mind commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading