fix(voice): atomic install-start guard for Whisper/Piper install RPCs#1787
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an atomic per-engine InstallSlot to serialize Whisper/Piper installs (handlers acquire a slot and hold it for the background task; on failure they return current status). Adds ComposioClient::execute_tool_once and updates the auth-retry wrapper to call it to avoid compounded retries. ChangesInstall-slot concurrency and handler integration
Composio single-call helper and retry wrapper
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
45fdf42 to
fb2eec9
Compare
|
Scope correction (force-push): dropped commit The composio fix has been saved as a patch and will be re-opened as a standalone PR against PR head is now |
fb2eec9 to
5cb2569
Compare
…bbit tinyhumansai#7) The Whisper and Piper install RPC handlers used a non-atomic read_status -> check -> write_status sequence to decide whether to spawn a background install. Two concurrent callers (a double-click, or the auto-install-on-dropdown-change firing alongside a manual button click) could both observe state != Installing and both spawn install tasks, which then raced on the same `.part` file inside `voice_install_common::download_to_file` (it deletes any pre-existing `.part` before streaming), causing mutual data corruption and the "download keeps restarting" symptom. Fix: add an engine-keyed in-flight set in `voice_install_common` guarded by a single Mutex, plus a `try_acquire_install_slot` that does the check-and-claim under one lock acquisition. Both handlers now acquire a slot before spawning; if the slot is already held the handler short-circuits and returns the in-flight status without spawning. The slot is moved into the spawned tokio task so its Drop releases it when the install task actually exits (including via panic), not when the RPC handler returns. Tests: unit coverage for the slot primitive (grant -> block -> release, per-engine independence, and a 32-way concurrent acquire that asserts exactly one winner — the unit-level analogue of the RPC race), plus handler-level regression tests that pre-hold the slot and fire two concurrent handler calls, asserting both short-circuit to the Installing status rather than spawning duplicate installs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5cb2569 to
dd56553
Compare
…tool_once auth_retry.rs called execute_tool() which already has its own post-OAuth retry loop (execute_tool_with_post_oauth_retry). This caused 4 total HTTP calls instead of the intended 2 when both layers triggered, and broke the retries_once_only_even_when_second_call_still_errors test (counter: 4, want 2). Add execute_tool_once() to ComposioClient — a single-shot execute with no built-in retry. auth_retry.rs now uses this so it owns the retry loop exclusively. All 6 auth_retry tests pass locally; execute_tool retains its built-in retry for callers that do not use the auth_retry wrapper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/composio/client.rs`:
- Around line 166-168: The outbound execute call in execute_tool_once currently
logs only entry; wrap the self.post_execute_tool(&body).await call to inspect
its Result and emit tracing debug/trace logs for both success and failure: after
creating body (variable body) call self.post_execute_tool(&body).await, match on
the Result, on Ok(log a debug message including the tool name and a concise
representation of the successful response/result), on Err(log a debug/error
message including the tool name and the error via %err), then return/propagate
the original success or error; use tracing::debug! or tracing::error! as
appropriate so execute_tool_once (and the body/post_execute_tool interaction)
has exit/outcome observability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a6ebd697-903f-4c20-bdc3-bb658739e094
📒 Files selected for processing (2)
src/openhuman/composio/auth_retry.rssrc/openhuman/composio/client.rs
Addresses CodeRabbit review comment (discussion_r3246732821): the post_execute_tool call now logs success (tool name, successful flag, has_error flag) and failure (tool name, error) via tracing::debug!/error! so exit/outcome observability matches the repo's diagnostics guidelines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Babysitter final report — merge-ready ✓Head SHA: Commits added this session
Reviewer comments
CodeRabbit re-reviewed and issued APPROVED at CI — all 21 checks greenEvery check passed on head
Deferred / out-of-scope
PR is reviewer-approved and CI fully green. Ready for maintainer merge. |
Resolves conflicts in src/openhuman/composio/{auth_retry.rs,client.rs}.
Both branches independently introduced ComposioClient::execute_tool_once
(the non-retrying primitive used by auth_retry to avoid stacking two
retry layers), which git auto-merged into a duplicate definition.
Resolution:
- auth_retry.rs: dropped the now-redundant inline comment about
using execute_tool_once; the module-level docstring already
documents the relationship with PR tinyhumansai#1707.
- client.rs: removed the duplicate definition, kept the version with
tracing::error! on failure (from the CodeRabbit observability fix
in 0c756b0).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
graycyrus
left a comment
There was a problem hiding this comment.
Walkthrough
Solid race-condition fix replacing the non-atomic read_status → check → write_status sequence with a proper Mutex<HashSet>-backed try_acquire_install_slot() and RAII InstallSlot guard. The design is clean — separate locks for status polling vs. install-start gating, correct Drop semantics (graceful on poisoned mutex), slot moved into the spawned task so it lives for the actual install duration. One copy-paste bug in the test file undermines the whisper concurrency regression test.
Change Summary
| File | Change type | Description |
|---|---|---|
src/openhuman/composio/client.rs |
Relocated + log fix | execute_tool_once moved earlier in impl block; error log upgraded from debug! to error! (CodeRabbit follow-up) |
src/openhuman/local_ai/voice_install_common.rs |
New code | InstallSlot RAII guard + try_acquire_install_slot() + IN_FLIGHT slot table + 3 unit tests |
src/openhuman/local_ai/schemas.rs |
Refactored | Both whisper/piper install handlers switched from read_status check to try_acquire_install_slot; slot moved into spawned task |
src/openhuman/local_ai/schemas_tests.rs |
New tests | 2 handler-level concurrent regression tests for whisper and piper |
Per-file Analysis
voice_install_common.rs
The core of the PR. OnceLock<Mutex<HashSet<&'static str>>> is the right choice — lazy init, no allocation on the hot path (status polls don't touch this lock), and &'static str keys avoid lifetime issues. The Drop impl correctly handles poisoned mutex without panicking (which would abort on double-panic). try_acquire_install_slot uses .expect() on the lock, which is appropriate — if the lock is poisoned at acquire time, something is catastrophically wrong and panicking is the right call. The test suite covers acquire/block/release, cross-engine independence, and 32-way concurrent racing. Well done.
schemas.rs
Both handlers follow the same pattern: try_acquire_install_slot → on None return current status, on Some proceed with install → move slot into tokio::spawn. The let _slot = slot; idiom correctly ensures the guard lives for the task's duration. Logging uses tracing::debug! with [voice-install:whisper/piper] prefixes — consistent with the existing schemas.rs pattern.
schemas_tests.rs
The piper test is correct. The whisper test has a copy-paste bug — see inline comment.
composio/client.rs
Method relocation + error log level fix. Already reviewed and confirmed by CodeRabbit — no additional findings.
graycyrus
left a comment
There was a problem hiding this comment.
Walkthrough
Continuation review. My prior REQUEST_CHANGES flagged a copy-paste bug in install_whisper_handler_serializes_concurrent_calls — that finding was incorrect. @sanil-23 was right: both arms of the tokio::join! call handle_local_ai_install_whisper, not piper. Apologies for the noise.
I've re-reviewed the full diff with fresh eyes. The atomic slot guard is well-designed: OnceLock<Mutex<HashSet>> for the slot table, RAII InstallSlot with correct Drop semantics (including poisoned-mutex handling), and the slot is properly moved into the spawned task so it outlives the RPC handler. The 32-way concurrent race test is a particularly nice touch — it exercises exactly the race that CodeRabbit flagged on #1755.
Change Summary
| File | Change type | Description |
|---|---|---|
src/openhuman/composio/client.rs |
Relocated + modified | execute_tool_once moved earlier in impl block; failure log upgraded debug! → error!; doc comment refreshed |
src/openhuman/local_ai/schemas.rs |
Refactored | Both whisper/piper install handlers: read_status check → atomic try_acquire_install_slot with RAII guard moved into spawned task |
src/openhuman/local_ai/schemas_tests.rs |
New tests | 2 handler-level concurrent regression tests (whisper + piper) |
src/openhuman/local_ai/voice_install_common.rs |
New code | InstallSlot RAII guard + try_acquire_install_slot() + IN_FLIGHT slot table + 3 unit tests |
Per-file Analysis
voice_install_common.rs
The core primitive is clean. try_acquire_install_slot does check-and-insert under a single mutex acquisition — no TOCTOU window. The Drop impl correctly handles the poisoned-mutex case by logging and continuing rather than double-panicking. The drain_test_slot helper ensures test isolation for the global static. Good separation of concerns: IN_FLIGHT owns the start decision, STATUS_TABLE advertises lifecycle state.
schemas.rs
Both handlers follow the same pattern: acquire slot → write Installing status → spawn task with slot moved in. The slot's lifetime now matches the install's actual duration rather than the brief RPC handler window. Comments explain the rationale clearly.
schemas_tests.rs
Both handler-level tests pre-acquire the slot from the test side, ensuring the handlers hit the short-circuit path without touching the network. Clean test isolation with ENV_LOCK + TempDir + reset_status cleanup.
composio/client.rs
Pure relocation + log level fix. The tracing::error! upgrade for the failure path is appropriate — failed outbound calls should be visible in production logs.
Observations (non-blocking)
[minor] voice_install_common.rs uses log::debug!/log::error! while schemas.rs uses tracing::debug!. This follows the existing per-file convention in the module, so it's consistent locally, but it's a latent inconsistency worth cleaning up in a future pass when the module standardizes on one crate.
No critical or major issues found. The prior CHANGES_REQUESTED should be considered resolved — the flagged finding was a reviewer error. PR is ready for approval.
Summary
read_status→ check →write_status(Installing)sequence inhandle_local_ai_install_whisper/handle_local_ai_install_piperwith an atomictry_acquire_install_slot()guard backed by aMutex<HashSet<&'static str>>slot table.InstallSlotRAII guard is moved into thetokio::spawntask so the slot lives for the install's actual duration (download + extract + validate), not just the brief RPC handler call window..partfile. A second call sees the slot held and returns the current "installing" status without re-spawning.Problem
CodeRabbit comment #7 on PR #1755 flagged the race at
src/openhuman/local_ai/schemas.rs:1068. The handler did:read_status(engine)— non-atomic snapshotInstalling— race window opens herewrite_status(Installing(0%))— second concurrent call writes after the firsttokio::spawnthe download — both call sites spawn; downloads race on same.partRare in practice (the dropdown auto-install + Install button are user-triggered, so back-to-back concurrent calls require a deliberate double-click within ms), but the agent runtime can also call these RPCs and we don't want the engine catalogue to depend on UI throttling.
Solution
IN_FLIGHT: OnceLock<Mutex<HashSet<&'static str>>>slot table invoice_install_common.rs.try_acquire_install_slot(engine)performs check-and-claim under a single mutex acquisition — atomic by construction.InstallSlotis an RAII guard; Drop releases the slot. Releases on panic too (per Rust drop semantics), so a panicked install can't permanently block re-installs.let slot = match try_acquire_install_slot(ENGINE_X) { Some(s) => s, None => return read_status(engine) }. Moveslotinto the spawned task so it lives for the full install.Submission Checklist
try_acquire_install_slot+InstallSlot::drop+ the two handler short-circuit branches are all exercised by the 32-way concurrent-acquire test + 2 handler-level regressions firing concurrenttokio::join!calls.Refsbelow (parent Prioritize fully local speech and Composer operation #1710).Impact
Desktop only (Rust core sidecar). No frontend change. Internal contract change:
voice_install_common::write_status(Installing(0%))callers should now route throughtry_acquire_install_slot()first — there's exactly one such pattern (the two install handlers), already migrated.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
cargo test --lib openhuman::local_ai::voice_install_common11/11 pass (3 new — concurrent acquire, slot release on Drop, slot release on panic).cargo test --lib openhuman::local_ai::schemas23/23 pass (includes 2 handler-level concurrent-call regressions).cargo fmt --check+cargo checkclean.Validation Blocked
Behavior Changes
composio.local_ai_install_whisperorcomposio.local_ai_install_piperno longer race-spawn duplicate downloads.state == "installing"without disrupting the first.Parity Contract
write_status(Installing(0%))lived; the spawn that follows is unchanged.Duplicate / Superseded PR Handling
Summary by CodeRabbit
Bug Fixes
Tests
Chores