fix(shell): bounded macOS/Linux CEF cache-lock wait + graceful init failure (TAURI-RUST-F)#3337
Conversation
…RUST-F) The macOS/Linux CEF cache-lock preflight exited(1) the moment it saw a live lock-holder. But the dominant TAURI-RUST-F cause is a sequential relaunch race: the prior instance is mid-teardown and releases the lock within a second. Replace the one-shot check with a bounded wait (exponential backoff, 5s budget) — the macOS/Linux analogue of the existing Windows pre-CEF wait — so the relaunch resolves seamlessly instead of being killed. If the lock is still held after the budget, exit cleanly with code 0 (was 1, which made relaunch wrappers treat it as a crash). Defense-in-depth: any failure that still reaches cef::initialize (lock freed then re-acquired, or a GPU/sandbox/permission failure) is handled by the vendored runtime's cef_init_guard, which shows a dialog and exits cleanly. Together they make the 'assertion left == right' panic impossible regardless of cause. Verified on macOS: a second instance launched against a live lock-holder waits with backoff for 5s then exits 0 with an actionable message and no panic; the happy path (lock free) boots normally.
📝 WalkthroughWalkthroughAdds a bounded exponential-backoff wait for CEF cache-locks on macOS/Linux (pub fn wait_for_cache_release), updates held-lock messaging, integrates the wait into app startup, adds backoff unit tests, and bumps the vendored tauri-cef SHA. ChangesCEF Cache-Lock Wait Flow
sequenceDiagram
participant AppStartup
participant CefPreflight as cef_preflight::wait_for_cache_release
participant CacheChecker as check_default_cache
participant Process as ProcessExit
AppStartup->>CefPreflight: call wait_for_cache_release()
CefPreflight->>CacheChecker: invoke check_default_cache()
CacheChecker-->>CefPreflight: LockHeld / NoLock / NoHomeDir
alt NoHomeDir
CefPreflight-->>AppStartup: return early
else LockHeld and within budget
CefPreflight->>CefPreflight: sleep backoff_delay(attempt)
CefPreflight->>CacheChecker: recheck
else LockHeld and budget exhausted
CefPreflight->>Process: print CefLockError and exit(0)
else NoLock
CefPreflight-->>AppStartup: return and continue startup
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…(TAURI-RUST-F)
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src-tauri/src/cef_preflight.rs`:
- Around line 294-295: The message printed for CefLockError::Held is
macOS-specific but is now reached on Linux; change the stderr output so it is
platform-aware or neutral: when handling CefLockError::Held (the variable `held`
in cef_preflight.rs) print a macOS-specific pkill suggestion only if
cfg!(target_os = "macos"), otherwise print a Linux-appropriate suggestion or a
generic “ensure previous OpenHuman instance is terminated” message; update the
eprintln! call that currently prints `held` verbatim to perform this conditional
formatting.
- Around line 289-304: The final sleep call in the CEF cache wait loop can
exceed the WAIT_BUDGET because the delay is not clamped to the remaining budget
time. After checking that start.elapsed() is less than WAIT_BUDGET, calculate
the remaining time available in the budget by subtracting start.elapsed() from
WAIT_BUDGET, then clamp the delay returned by backoff_delay(attempt) to not
exceed this remaining time before calling std::thread::sleep. This ensures the
total elapsed time never exceeds the documented WAIT_BUDGET contract.
In `@app/src-tauri/vendor/tauri-cef`:
- Line 1: Update the .github/tauri-cef-expected-sha file to match the new
tauri-cef submodule commit by replacing its contents with
8ea806ed1b444a2260b80f5cb4ec8e0dbdb51965; alternatively, if you prefer not to
change the guard, revert the tauri-cef submodule bump so the existing expected
SHA and gitlink remain in sync (ensure the change is committed in the same PR).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 63b2c7aa-735f-4128-82ba-96b6c21b25d5
📒 Files selected for processing (3)
app/src-tauri/src/cef_preflight.rsapp/src-tauri/src/lib.rsapp/src-tauri/vendor/tauri-cef
- clamp backoff to remaining WAIT_BUDGET so total wait never exceeds the documented 5s contract by up to BACKOFF_CAP (CodeRabbit) - make CefLockError::Held workaround text platform-aware; macOS .app pkill pattern no longer shown on Linux (cef_preflight compiles for both)
…tk prelude + macOS NSAlert unsafe (TAURI-RUST-F)
Manual validation — macOS, debug
|
| Test | Scenario | Observed | Result |
|---|---|---|---|
| T1 | Normal launch (no lock) | preflight no SingletonLock → boots, no crash |
✅ |
| T2a | 2nd instance while 1st holds a live lock | detected live holder; backoff 100→200→400→500→500ms; waited the full 5s budget → process exits with code 0, no panic |
✅ |
| T2b | Kill the holder mid-wait (relaunch race) | waited, holder gone, poll 5 (~1.7s) → CEF cache lock released … proceeding to CEF init → boots |
✅ |
Confirmed: no assertion left == right panic on any path. The exit(1) → exit(0) change verified end-to-end (T2a give-up returns 0, so auto-update/relaunch wrappers no longer treat it as a crash). The dominant TAURI-RUST-F cause — a sequential relaunch race — resolves seamlessly (T2b), and the live exponential backoff matches the unit-tested schedule.
Not covered locally: the Linux gtk-prelude path (#18) is CI-only; the genuine cef::initialize→0 NSAlert dialog (runtime guard from tauri-cef#17/#18) is exercised by unit/CI rather than forced live.
Summary
exit(1)on first collision) with a bounded wait (exponential backoff, 5s budget) — the macOS/Linux analogue of the existing Windows pre-CEF wait. The dominant Sentry TAURI-RUST-F cause is a sequential relaunch race where the prior instance releases the lock within a second; waiting lets the relaunch resolve seamlessly instead of being killed.1, which made relaunch wrappers treat it as a crash).cef::initializeassert into a graceful dialog + clean exit — together they make theassertion left == rightpanic impossible on every platform, regardless of cause.Problem
cef::initialize()returns1on success,0on failure. The vendoredtauri-runtime-cefasserted== 1, so a failure became a fatalpanic: assertion left == right failed, left: 0, right: 1(Sentry TAURI-RUST-F). The prior Windows-only fix (#3210) covered the Windows relaunch race, but the panic still recurs on the post-fix release 0.57.13 across Windows + Linux + macOS — because:exit(1)s on the first collision (no wait for a dying prior instance), and only catches the lock case; a lock freed-then-reacquired race, or a non-lock cause (GPU/sandbox/permission), falls straight through to the assert.Solution
app/src-tauri/src/cef_preflight.rs: addwait_for_cache_release()— pollscheck_default_cache()with exponential backoff (100→500ms, capped) up to a 5s budget. Lock clears → proceed; still held after budget → print the actionable message andexit(0); cache path unresolvable → proceed best-effort. Purebackoff_delayis unit-tested on-host.app/src-tauri/src/lib.rs: swap thecheck_default_cache()+exit(1)block forcef_preflight::wait_for_cache_release().c532a2d96):vendor/tauri-cefbumped tofeat/ceftip5ec3d883= merged fix(cef): import gtk prelude for Linux dialog (TAURI-RUST-F) tauri-cef#18, which carries Memory Layer Only #17 (runtime assert→graceful-exit + native dialog) plus its Linux gtk-prelude + macOS NSAlert-unsafe build fixes. The universal backstop for any failure that still reachescef::initializeon any platform..github/tauri-cef-expected-shapin updated to match.Design: the benign "another instance is running" give-up stays terminal-only (no GUI nag on transient relaunch races); the GUI dialog (runtime guard) fires only on a genuine unrecoverable init failure.
Submission Checklist
backoff_delayis covered by host unit tests; the wait loop +process::exit+ live-PID lock detection are process/platform-dependent and not instrumentable on the coverage runner (same constraint as fix(shell): Windows pre-CEF cache-lock wait to stop relaunch panic (TAURI-RUST-F) #3210). Validated by manual repro instead.Sentry-Issue: TAURI-RUST-F(see Related).Impact
RunEvent::Ready).exit(1)→exit(0)on give-up so auto-update/relaunch wrappers no longer see a crash.Related
Sentry-Issue: TAURI-RUST-Ffeat/ceftip5ec3d883inc532a2d96.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/tauri-rust-f-cef-init-graceful-exit5ec3d883+ pin update)Validation Run
pnpm --filter openhuman-app format:check— Rust fmt clean on changed filespnpm typechecknot applicablecargo test --lib cef_preflight→ 13 passed (incl. 2 new backoff tests)cargo fmt --check+cargo checkclean (shell)cargo clippy -p tauri-runtime-cefclean (new module: zero warnings)Validation Blocked
command:live CEF-init-failure (non-lock) repro on macOSerror:cannot deterministically forcecef::initializeto fail for a non-lock reason on a dev hostimpact:low — covered by the lock-held repro (passed) + the runtime guard'scatch_unwindsafety; Windows/Linux glue via CI build matrixBehavior Changes
exit(1).Summary by CodeRabbit
Bug Fixes
Tests
Chores