feat(voice): Phase 2 always-on listening engine + RPC (4/8 of #3307)#3343
Conversation
Run enigo keyboard/mouse on the app main thread via a native-registry executor; enigo's macOS TSMGetInputSourceProperty traps off-thread and crashes the CEF host. Adds mouse/keyboard tools, the main_thread bridge, and downscaled screenshots so the model can see them. Slice 1/7 of tinyhumansai#3307 (was the 'computer control' area).
… loop Adds the Rust-internal automate engine (poll-until-stable settle, playback verification), the AXEnabled diagnostics field + settle primitives on ax_interact, the Music fast-path, and the Windows UIA superset. Exposes launch_platform as pub(crate) so the automate loop can launch apps mid-flow. Slice 2/7 of tinyhumansai#3307 (accessibility/automate engine).
…trator Registers the AutomateTool (multi-step UI flows in one call) and the ax_interact denylist/opt-in plumbing; adds the catalog toggle, tool definition, and orchestrator prompt guidance (automate + screenshot/ mouse/keyboard fallback for Electron apps with empty AX trees). Slice 3/7 of tinyhumansai#3307 (tool wiring + prompts).
Continuous cpal mic → VAD segmenter → STT → agent with no hotkey, opt-in via voice_server.always_on_enabled, 'Hey Tiny' wake word (English-forced STT + fuzzy match), and screen-lock privacy pause. Adds the config schema, live-apply on the settings RPC, start_if_enabled wiring, and a JSON-RPC roundtrip E2E. Slice 4/7 of tinyhumansai#3307 (always-on core).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds an always-on listening feature: VAD-based speech segmentation, capture thread + resampling, WAV encoding and STT transcription, wake-word extraction with fuzzy matching, config schema fields (always_on_enabled, wake_word) and RPC wiring, login-time start/stop, macOS lock privacy hook, and tests. ChangesAlways-On Voice Listening Feature
Sequence Diagram(s)sequenceDiagram
participant Client
participant RPCHandler
participant ConfigOps
participant VoiceAlwaysOn
Client->>RPCHandler: config_update_voice_server_settings
RPCHandler->>ConfigOps: apply_voice_server_settings (includes always_on_enabled, wake_word)
ConfigOps->>ConfigOps: persist trimmed wake_word, always_on flag
ConfigOps-->>RPCHandler: apply result
RPCHandler->>VoiceAlwaysOn: start_if_enabled(config)
VoiceAlwaysOn-->>RPCHandler: started/stopped
RPCHandler-->>Client: success
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Take main's versions of already-merged slice-1/2/3 files.
Independent review (beyond the CodeRabbit pass)Reviewed the always-on listening core — the Reviewed clean
No correctness issues found. LGTM once CI is green. |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
src/openhuman/config/schema/voice_server.rs (1)
165-171: ⚡ Quick winAssert all newly added defaults in the legacy-deserialization test.
This test currently verifies only a subset of the new Phase-2 fields. Adding assertions for
vad_onset_threshold,vad_max_utterance_secs, andwake_wordwill better protect backward-compat deserialization.Proposed test hardening
#[test] fn deserializes_with_all_vad_fields_defaulted() { // An older config file with none of the Phase 2 keys must still load. let c: VoiceServerConfig = serde_json::from_str("{}").unwrap(); assert!(!c.always_on_enabled); + assert_eq!(c.vad_onset_threshold, default_vad_onset_threshold()); assert_eq!(c.vad_hangover_ms, default_vad_hangover_ms()); assert_eq!(c.vad_min_speech_ms, default_vad_min_speech_ms()); + assert_eq!(c.vad_max_utterance_secs, default_vad_max_utterance_secs()); + assert_eq!(c.wake_word, default_wake_word()); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/config/schema/voice_server.rs` around lines 165 - 171, The legacy-deserialization test deserializes an empty JSON but only asserts a subset of new Phase-2 default VAD fields; update the test function deserializes_with_all_vad_fields_defaulted() to also assert that c.vad_onset_threshold == default_vad_onset_threshold(), c.vad_max_utterance_secs == default_vad_max_utterance_secs(), and c.wake_word == default_wake_word() (or the appropriate default accessor used elsewhere) so all newly added defaults are verified on legacy deserialization.src/openhuman/config/ops_tests.rs (1)
1046-1048: ⚡ Quick winAssert the new always-on fields in this roundtrip test.
The patch now sets
always_on_enabledandwake_word, but the test doesn’t verify those values in the returned/saved config.✅ Suggested assertion additions
let outcome = load_and_apply_voice_server_settings(patch) .await .expect("ok"); + assert_eq!( + outcome.value["config"]["voice_server"]["always_on_enabled"], + serde_json::json!(true) + ); + assert_eq!( + outcome.value["config"]["voice_server"]["wake_word"], + serde_json::json!("Hey Tiny") + ); assert!( outcome.value["config"]["voice_server"]["min_duration_secs"] .as_f64()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/config/ops_tests.rs` around lines 1046 - 1048, The test that performs the config roundtrip must assert the newly-set fields `always_on_enabled` and `wake_word`; after you load or receive the roundtripped config (the variable holding the returned/saved config in the roundtrip test), add assertions that `always_on_enabled` equals Some(true) and `wake_word` equals Some("Hey Tiny".to_string()) to ensure those values are preserved by the roundtrip logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/config/ops.rs`:
- Around line 2044-2046: The code assigns update.wake_word directly to
config.voice_server.wake_word, allowing whitespace-only values to persist;
change the assignment to normalize by trimming surrounding whitespace from
update.wake_word (use update.wake_word.trim()), and if the trimmed string is
empty treat it as the documented “no wake word” (set
config.voice_server.wake_word to the empty string or None consistent with its
type); update the logic around update.wake_word and
config.voice_server.wake_word to use the trimmed value so whitespace-only inputs
do not become a non-empty wake word.
In `@src/openhuman/config/schemas.rs`:
- Around line 1731-1736: The current code calls
config_rpc::load_and_apply_voice_server_settings(patch).await? then attempts to
refresh live state with config_rpc::load_config_with_timeout().await but
silently ignores failures to reload; change this to log a grep-friendly error
when load_config_with_timeout() fails and only call
crate::openhuman::voice::always_on::start_if_enabled(&config).await on success;
specifically, replace the if let Ok(config) =
config_rpc::load_config_with_timeout().await block with explicit match handling
that logs an error (including the returned error) using a stable prefix like
"voice:reload:error" and, on Ok(config), logs a "voice:reload:ok" message before
calling always_on::start_if_enabled to ensure failures are visible and state
transitions are traceable.
In `@src/openhuman/credentials/ops.rs`:
- Around line 44-47: The always-on voice service is started in start_if_enabled
but never stopped during logout; update the logout teardown
(stop_login_gated_services) to symmetrically stop or disable the always-on
service by invoking the appropriate shutdown API on
crate::openhuman::voice::always_on (e.g., add a stop_if_enabled/stop function or
call the existing stop method) so microphone capture is explicitly halted on
logout; locate start_if_enabled in the always_on module and add the matching
stop call in stop_login_gated_services to ensure resources are released and
VAD/STT are disabled.
In `@src/openhuman/voice/always_on.rs`:
- Around line 224-230: The code returns early when RUNNING is true, skipping
refresh of runtime settings so the process keeps using the original
VadConfig/Config; fix by ensuring settings are refreshed before the early return
or by applying them to the running processor: move the
VadConfig::from_server_config(&app_config.voice_server) and let config =
app_config.clone() to occur before the RUNNING.swap(...) check (or,
alternatively, call a runtime update method on the running processor with the
new VadConfig/Config when RUNNING was already true), so wake-word/VAD changes
take effect without restart.
---
Nitpick comments:
In `@src/openhuman/config/ops_tests.rs`:
- Around line 1046-1048: The test that performs the config roundtrip must assert
the newly-set fields `always_on_enabled` and `wake_word`; after you load or
receive the roundtripped config (the variable holding the returned/saved config
in the roundtrip test), add assertions that `always_on_enabled` equals
Some(true) and `wake_word` equals Some("Hey Tiny".to_string()) to ensure those
values are preserved by the roundtrip logic.
In `@src/openhuman/config/schema/voice_server.rs`:
- Around line 165-171: The legacy-deserialization test deserializes an empty
JSON but only asserts a subset of new Phase-2 default VAD fields; update the
test function deserializes_with_all_vad_fields_defaulted() to also assert that
c.vad_onset_threshold == default_vad_onset_threshold(), c.vad_max_utterance_secs
== default_vad_max_utterance_secs(), and c.wake_word == default_wake_word() (or
the appropriate default accessor used elsewhere) so all newly added defaults are
verified on legacy deserialization.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 315fc0e5-c3ab-4ba7-a34d-f7e429a45406
📒 Files selected for processing (9)
src/openhuman/config/ops.rssrc/openhuman/config/ops_tests.rssrc/openhuman/config/schema/voice_server.rssrc/openhuman/config/schemas.rssrc/openhuman/credentials/ops.rssrc/openhuman/voice/always_on.rssrc/openhuman/voice/audio_capture.rssrc/openhuman/voice/mod.rstests/json_rpc_e2e.rs
…n logout CodeRabbit tinyhumansai#3343: - config/ops.rs: trim wake_word so whitespace-only collapses to the documented 'empty = no wake word' case. - config/schemas.rs: match on the live-apply config reload — warn (don't silently skip) when it fails so the saved toggle's non-application is traceable. - credentials/ops.rs: add symmetric always_on::stop() on logout so mic capture stops transcribing/delivering after logout (privacy). New always_on::stop() flips the ENABLED gate; covered by a unit test.
Summary
Slice 4/8 of #3307 — Phase 2 always-on listening engine + config + RPC.
voice_server.always_on_enabled.start_if_enabledwiring + a JSON-RPC roundtrip E2E.Files (9)
voice/{always_on,audio_capture,mod}.rs,config/schema/voice_server.rs,config/{schemas,ops,ops_tests}.rs,credentials/ops.rs,tests/json_rpc_e2e.rs.Summary by CodeRabbit
New Features
Tests