fix(voice): atomic install-start guard for Whisper/Piper install RPCs by sanil-23 · Pull Request #1787 · tinyhumansai/openhuman

sanil-23 · 2026-05-15T06:05:12Z

Summary

Replaces the non-atomic read_status → check → write_status(Installing) sequence in handle_local_ai_install_whisper / handle_local_ai_install_piper with an atomic try_acquire_install_slot() guard backed by a Mutex<HashSet<&'static str>> slot table.
The new InstallSlot RAII guard is moved into the tokio::spawn task so the slot lives for the install's actual duration (download + extract + validate), not just the brief RPC handler call window.
Concurrent click / dropdown change can no longer spawn duplicate downloads racing on the same .part file. A second call sees the slot held and returns the current "installing" status without re-spawning.

Problem

CodeRabbit comment #7 on PR #1755 flagged the race at src/openhuman/local_ai/schemas.rs:1068. The handler did:

read_status(engine) — non-atomic snapshot
Check if state was already Installing — race window opens here
write_status(Installing(0%)) — second concurrent call writes after the first
tokio::spawn the download — both call sites spawn; downloads race on same .part

Rare in practice (the dropdown auto-install + Install button are user-triggered, so back-to-back concurrent calls require a deliberate double-click within ms), but the agent runtime can also call these RPCs and we don't want the engine catalogue to depend on UI throttling.

Solution

New IN_FLIGHT: OnceLock<Mutex<HashSet<&'static str>>> slot table in voice_install_common.rs.
try_acquire_install_slot(engine) performs check-and-claim under a single mutex acquisition — atomic by construction.
InstallSlot is an RAII guard; Drop releases the slot. Releases on panic too (per Rust drop semantics), so a panicked install can't permanently block re-installs.
Both handlers now: let slot = match try_acquire_install_slot(ENGINE_X) { Some(s) => s, None => return read_status(engine) }. Move slot into the spawned task so it lives for the full install.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
Diff coverage ≥ 80% — try_acquire_install_slot + InstallSlot::drop + the two handler short-circuit branches are all exercised by the 32-way concurrent-acquire test + 2 handler-level regressions firing concurrent tokio::join! calls.
N/A: behaviour-only refactor of an existing internal contract; no feature rows in the coverage matrix.
N/A: addresses CodeRabbit Fix/monday patches #7 deferral from Prioritize fully local speech and Composer operation #1710 voice PR; Prioritize fully local speech and Composer operation #1710 is the parent issue, already linked from feat(voice): fully-local STT + TTS via Whisper/Piper provider factory #1755.
No new external network dependencies introduced (the change is pure in-process synchronization).
N/A: release-cut smoke surface unchanged; install flow's user-visible behavior is identical except duplicate-click no longer fans out.
Linked issue referenced via Refs below (parent Prioritize fully local speech and Composer operation #1710).

Impact

Desktop only (Rust core sidecar). No frontend change. Internal contract change: voice_install_common::write_status(Installing(0%)) callers should now route through try_acquire_install_slot() first — there's exactly one such pattern (the two install handlers), already migrated.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A: GitHub issue tracker, not Linear
URL: N/A: GitHub issue tracker, not Linear

Commit & Branch

Branch: feat/1710-voice-atomic-install
Commit SHA: 5cb2569

Validation Run

N/A: pnpm format:check skipped — known CRLF/LF drift on ~600 unrelated files on Windows (per established team practice with --no-verify push)
N/A: pnpm typecheck command not present in this workspace's package.json scripts
Focused tests: cargo test --lib openhuman::local_ai::voice_install_common 11/11 pass (3 new — concurrent acquire, slot release on Drop, slot release on panic). cargo test --lib openhuman::local_ai::schemas 23/23 pass (includes 2 handler-level concurrent-call regressions). cargo fmt --check + cargo check clean.
Rust fmt/check (if changed): clean on the two changed files.
N/A: Tauri shell Cargo.toml not modified.

Validation Blocked

command: N/A
error: N/A
impact: N/A

Behavior Changes

Intended behavior change: concurrent invocations of composio.local_ai_install_whisper or composio.local_ai_install_piper no longer race-spawn duplicate downloads.
User-visible effect: none on the happy path. Power users / scripts hammering the install RPC will see the second call return state == "installing" without disrupting the first.

Parity Contract

Legacy behavior preserved: single Install click → identical flow. The slot is acquired in the same code position the original write_status(Installing(0%)) lived; the spawn that follows is unchanged.
Guard/fallback/dispatch parity checks: slot released on Drop (normal completion + panic). RPC handler returns the pre-acquired-status snapshot when the slot is held, matching the original "already installing" return shape.

Duplicate / Superseded PR Handling

Duplicate PR(s): N/A
Canonical PR: This one
Resolution: N/A

Summary by CodeRabbit

Bug Fixes
- Prevented duplicate concurrent installs for Whisper and Piper; repeated rapid clicks now immediately return the current install status and do not start a second install.
- Ensured install slot is held for the full background install duration and released when the task ends.
Tests
- Added concurrency regression tests that verify concurrent install requests short-circuit and report "installing".
Chores
- Adjusted tool-execution retry behavior so a single controlled retry is performed when applicable.

coderabbitai · 2026-05-15T06:05:27Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds an atomic per-engine InstallSlot to serialize Whisper/Piper installs (handlers acquire a slot and hold it for the background task; on failure they return current status). Adds ComposioClient::execute_tool_once and updates the auth-retry wrapper to call it to avoid compounded retries.

Changes

Install-slot concurrency and handler integration

Layer / File(s)	Summary
Install-slot concurrency primitive `src/openhuman/local_ai/voice_install_common.rs`	Introduces global `IN_FLIGHT` set, `InstallSlot` RAII token, and `try_acquire_install_slot` for atomic per-engine slot acquisition. Updated imports for `HashSet` and `OnceLock`. Unit tests validate blocking until slot drops, cross-engine independence, and async contention semantics.
Whisper handler atomic slot acquisition `src/openhuman/local_ai/schemas.rs`	Whisper install handler replaces non-atomic read-status check with atomic `try_acquire_install_slot`. On acquisition failure, logs debug message and returns current status. Acquired slot is moved into spawned background task via `_slot` binding to hold the guard for install duration.
Piper handler atomic slot acquisition `src/openhuman/local_ai/schemas.rs`	Piper install handler applies same atomic slot acquisition pattern as Whisper. On acquisition failure, returns current Piper status. Acquired slot is moved into background task to maintain guard across download/extract/validate lifecycle.
Handler concurrency regression tests `src/openhuman/local_ai/schemas_tests.rs`, `src/openhuman/local_ai/voice_install_common.rs`	Adds Tokio concurrency regression tests and cleanup helpers. Tests pre-acquire slots, force `Installing` state, invoke handlers concurrently via `tokio::join!`, assert both return `Installing` state, then reset status and workspace. Confirms handlers do not spawn duplicate installs.

Composio single-call helper and retry wrapper

Layer / File(s)	Summary
Composio single-call helper `src/openhuman/composio/client.rs`	Adds `ComposioClient::execute_tool_once` which validates `tool`, defaults `arguments` to `{}`, logs, constructs the execute payload, and calls `post_execute_tool` for a single non-retrying backend call.
Auth-retry wrapper uses execute_tool_once `src/openhuman/composio/auth_retry.rs`	`execute_with_auth_retry_inner` now calls `execute_tool_once` for the initial and retry attempts so the wrapper enforces exactly one retry attempt and avoids compounded client-level retries.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

tinyhumansai/openhuman#1708: Related Composio retry/auth changes and retry-loop interactions.

Suggested reviewers

senamakel

Poem

🐰 A rabbit hums in mutex light,

One slot to start, one slot to guard the night.
Whisper waits, Piper too, no twin installs begin;
The token sleeps inside the task — orderly within.
Hoppity locks, concurrency kept tight.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly describes the primary change: introducing an atomic install-start guard for Whisper/Piper install RPCs to prevent race conditions.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sanil-23 · 2026-05-15T06:53:55Z

Scope correction (force-push): dropped commit 45fdf42f (fix(composio): eliminate double-retry in auth_retry by using execute_tool_once). That commit was a real fix for a pre-existing auth_retry bug on main, but it doesn't belong on this PR — this PR is scoped to the voice atomic-install guard. It was pulled in by the babysitter agent to unblock the failing auth_retry test in CI, which was failing on main independently of this branch (see commit message of fb2eec9c for the original diagnosis).

The composio fix has been saved as a patch and will be re-opened as a standalone PR against main, alongside the wider direct-mode composio work tracked in #1710.

PR head is now fb2eec9c (the no-op CI-trigger comment) on top of 5cb25690 (the actual atomic-install guard). CI will re-run.

…bbit tinyhumansai#7) The Whisper and Piper install RPC handlers used a non-atomic read_status -> check -> write_status sequence to decide whether to spawn a background install. Two concurrent callers (a double-click, or the auto-install-on-dropdown-change firing alongside a manual button click) could both observe state != Installing and both spawn install tasks, which then raced on the same `.part` file inside `voice_install_common::download_to_file` (it deletes any pre-existing `.part` before streaming), causing mutual data corruption and the "download keeps restarting" symptom. Fix: add an engine-keyed in-flight set in `voice_install_common` guarded by a single Mutex, plus a `try_acquire_install_slot` that does the check-and-claim under one lock acquisition. Both handlers now acquire a slot before spawning; if the slot is already held the handler short-circuits and returns the in-flight status without spawning. The slot is moved into the spawned tokio task so its Drop releases it when the install task actually exits (including via panic), not when the RPC handler returns. Tests: unit coverage for the slot primitive (grant -> block -> release, per-engine independence, and a 32-way concurrent acquire that asserts exactly one winner — the unit-level analogue of the RPC race), plus handler-level regression tests that pre-hold the slot and fire two concurrent handler calls, asserting both short-circuit to the Installing status rather than spawning duplicate installs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tool_once auth_retry.rs called execute_tool() which already has its own post-OAuth retry loop (execute_tool_with_post_oauth_retry). This caused 4 total HTTP calls instead of the intended 2 when both layers triggered, and broke the retries_once_only_even_when_second_call_still_errors test (counter: 4, want 2). Add execute_tool_once() to ComposioClient — a single-shot execute with no built-in retry. auth_retry.rs now uses this so it owns the retry loop exclusively. All 6 auth_retry tests pass locally; execute_tool retains its built-in retry for callers that do not use the auth_retry wrapper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/composio/client.rs`:
- Around line 166-168: The outbound execute call in execute_tool_once currently
logs only entry; wrap the self.post_execute_tool(&body).await call to inspect
its Result and emit tracing debug/trace logs for both success and failure: after
creating body (variable body) call self.post_execute_tool(&body).await, match on
the Result, on Ok(log a debug message including the tool name and a concise
representation of the successful response/result), on Err(log a debug/error
message including the tool name and the error via %err), then return/propagate
the original success or error; use tracing::debug! or tracing::error! as
appropriate so execute_tool_once (and the body/post_execute_tool interaction)
has exit/outcome observability.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6ebd697-903f-4c20-bdc3-bb658739e094

📥 Commits

Reviewing files that changed from the base of the PR and between dd56553 and d544622.

📒 Files selected for processing (2)

src/openhuman/composio/auth_retry.rs
src/openhuman/composio/client.rs

Addresses CodeRabbit review comment (discussion_r3246732821): the post_execute_tool call now logs success (tool name, successful flag, has_error flag) and failure (tool name, error) via tracing::debug!/error! so exit/outcome observability matches the repo's diagnostics guidelines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sanil-23 · 2026-05-15T08:22:18Z

@coderabbitai review

coderabbitai · 2026-05-15T08:22:24Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

sanil-23 · 2026-05-15T09:09:39Z

Babysitter final report — merge-ready ✓

Head SHA: 0c756b05

Commits added this session

SHA	Message
`0c756b05`	`obs(composio): add outcome tracing to execute_tool_once`

Reviewer comments

Surface	Comment	Resolution
Inline (CodeRabbit `discussion_r3246732821`)	Add outcome observability logs to `execute_tool_once` — log success fields (`successful`, `has_error`) and error details on failure	Fixed in `0c756b05`: added `tracing::debug!` on `Ok` path (logs `successful` + `has_error`) and `tracing::error!` on `Err` path (logs `error`). Replied to thread with fix SHA.

CodeRabbit re-reviewed and issued APPROVED at 2026-05-15T08:24:03Z.

CI — all 21 checks green

Every check passed on head 0c756b05:

Rust Quality (fmt + clippy) ✓
Rust Core Tests + Quality ✓
Rust Core Coverage (cargo-llvm-cov) ✓
Rust Tauri Coverage (cargo-llvm-cov) ✓
Rust Tauri Shell Tests ✓
Coverage Gate (diff-cover ≥ 80%) ✓
Frontend Unit Tests ✓
Frontend Coverage (Vitest) ✓
Type Check TypeScript ✓
Build Tauri App ✓
Build & smoke-test core image ✓
E2E (Linux / Appium Chromium) ✓
E2E (macOS / Appium Chromium) ✓
E2E (Windows / Appium Chromium) ✓
PR Submission Checklist ✓
Coverage Matrix Sync ✓
Markdown Link Check ✓
Smoke install.sh (ubuntu-22.04, macos-latest) ✓
install.ps1 tests + dry-run (windows-latest) ✓
CodeRabbit ✓ (Review skipped — incremental, already approved)

Deferred / out-of-scope

The auth_retry double-retry bug fix (commit 45fdf42f, dropped from this branch in a prior scope-correction force-push) should be landed separately against main — it's an independent pre-existing bug unrelated to the voice atomic-install guard.

PR is reviewer-approved and CI fully green. Ready for maintainer merge.

Resolves conflicts in src/openhuman/composio/{auth_retry.rs,client.rs}. Both branches independently introduced ComposioClient::execute_tool_once (the non-retrying primitive used by auth_retry to avoid stacking two retry layers), which git auto-merged into a duplicate definition. Resolution: - auth_retry.rs: dropped the now-redundant inline comment about using execute_tool_once; the module-level docstring already documents the relationship with PR tinyhumansai#1707. - client.rs: removed the duplicate definition, kept the version with tracing::error! on failure (from the CodeRabbit observability fix in 0c756b0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

graycyrus

Walkthrough

Solid race-condition fix replacing the non-atomic read_status → check → write_status sequence with a proper Mutex<HashSet>-backed try_acquire_install_slot() and RAII InstallSlot guard. The design is clean — separate locks for status polling vs. install-start gating, correct Drop semantics (graceful on poisoned mutex), slot moved into the spawned task so it lives for the actual install duration. One copy-paste bug in the test file undermines the whisper concurrency regression test.

Change Summary

File	Change type	Description
`src/openhuman/composio/client.rs`	Relocated + log fix	`execute_tool_once` moved earlier in impl block; error log upgraded from `debug!` to `error!` (CodeRabbit follow-up)
`src/openhuman/local_ai/voice_install_common.rs`	New code	`InstallSlot` RAII guard + `try_acquire_install_slot()` + `IN_FLIGHT` slot table + 3 unit tests
`src/openhuman/local_ai/schemas.rs`	Refactored	Both whisper/piper install handlers switched from `read_status` check to `try_acquire_install_slot`; slot moved into spawned task
`src/openhuman/local_ai/schemas_tests.rs`	New tests	2 handler-level concurrent regression tests for whisper and piper

Per-file Analysis

`voice_install_common.rs`

The core of the PR. OnceLock<Mutex<HashSet<&'static str>>> is the right choice — lazy init, no allocation on the hot path (status polls don't touch this lock), and &'static str keys avoid lifetime issues. The Drop impl correctly handles poisoned mutex without panicking (which would abort on double-panic). try_acquire_install_slot uses .expect() on the lock, which is appropriate — if the lock is poisoned at acquire time, something is catastrophically wrong and panicking is the right call. The test suite covers acquire/block/release, cross-engine independence, and 32-way concurrent racing. Well done.

`schemas.rs`

Both handlers follow the same pattern: try_acquire_install_slot → on None return current status, on Some proceed with install → move slot into tokio::spawn. The let _slot = slot; idiom correctly ensures the guard lives for the task's duration. Logging uses tracing::debug! with [voice-install:whisper/piper] prefixes — consistent with the existing schemas.rs pattern.

`schemas_tests.rs`

The piper test is correct. The whisper test has a copy-paste bug — see inline comment.

`composio/client.rs`

Method relocation + error log level fix. Already reviewed and confirmed by CodeRabbit — no additional findings.

graycyrus

Walkthrough

Continuation review. My prior REQUEST_CHANGES flagged a copy-paste bug in install_whisper_handler_serializes_concurrent_calls — that finding was incorrect. @sanil-23 was right: both arms of the tokio::join! call handle_local_ai_install_whisper, not piper. Apologies for the noise.

I've re-reviewed the full diff with fresh eyes. The atomic slot guard is well-designed: OnceLock<Mutex<HashSet>> for the slot table, RAII InstallSlot with correct Drop semantics (including poisoned-mutex handling), and the slot is properly moved into the spawned task so it outlives the RPC handler. The 32-way concurrent race test is a particularly nice touch — it exercises exactly the race that CodeRabbit flagged on #1755.

Change Summary

File	Change type	Description
`src/openhuman/composio/client.rs`	Relocated + modified	`execute_tool_once` moved earlier in impl block; failure log upgraded `debug!` → `error!`; doc comment refreshed
`src/openhuman/local_ai/schemas.rs`	Refactored	Both whisper/piper install handlers: `read_status` check → atomic `try_acquire_install_slot` with RAII guard moved into spawned task
`src/openhuman/local_ai/schemas_tests.rs`	New tests	2 handler-level concurrent regression tests (whisper + piper)
`src/openhuman/local_ai/voice_install_common.rs`	New code	`InstallSlot` RAII guard + `try_acquire_install_slot()` + `IN_FLIGHT` slot table + 3 unit tests

Per-file Analysis

voice_install_common.rs

The core primitive is clean. try_acquire_install_slot does check-and-insert under a single mutex acquisition — no TOCTOU window. The Drop impl correctly handles the poisoned-mutex case by logging and continuing rather than double-panicking. The drain_test_slot helper ensures test isolation for the global static. Good separation of concerns: IN_FLIGHT owns the start decision, STATUS_TABLE advertises lifecycle state.

schemas.rs

Both handlers follow the same pattern: acquire slot → write Installing status → spawn task with slot moved in. The slot's lifetime now matches the install's actual duration rather than the brief RPC handler window. Comments explain the rationale clearly.

schemas_tests.rs

Both handler-level tests pre-acquire the slot from the test side, ensuring the handlers hit the short-circuit path without touching the network. Clean test isolation with ENV_LOCK + TempDir + reset_status cleanup.

composio/client.rs

Pure relocation + log level fix. The tracing::error! upgrade for the failure path is appropriate — failed outbound calls should be visible in production logs.

Observations (non-blocking)

[minor] voice_install_common.rs uses log::debug!/log::error! while schemas.rs uses tracing::debug!. This follows the existing per-file convention in the module, so it's consistent locally, but it's a latent inconsistency worth cleaning up in a future pass when the module standardizes on one crate.

No critical or major issues found. The prior CHANGES_REQUESTED should be considered resolved — the flagged finding was a reviewer error. PR is ready for approval.

All clean

sanil-23 requested a review from a team May 15, 2026 06:05