Skip to content

fix(shell): bounded macOS/Linux CEF cache-lock wait + graceful init failure (TAURI-RUST-F)#3337

Merged
senamakel merged 5 commits into
tinyhumansai:mainfrom
oxoxDev:fix/tauri-rust-f-cef-init-graceful-exit
Jun 5, 2026
Merged

fix(shell): bounded macOS/Linux CEF cache-lock wait + graceful init failure (TAURI-RUST-F)#3337
senamakel merged 5 commits into
tinyhumansai:mainfrom
oxoxDev:fix/tauri-rust-f-cef-init-graceful-exit

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented Jun 4, 2026

Summary

  • Replace the one-shot macOS/Linux CEF cache-lock preflight (exit(1) on first collision) with a bounded wait (exponential backoff, 5s budget) — the macOS/Linux analogue of the existing Windows pre-CEF wait. The dominant Sentry TAURI-RUST-F cause is a sequential relaunch race where the prior instance releases the lock within a second; waiting lets the relaunch resolve seamlessly instead of being killed.
  • On give-up (still held after the budget) exit cleanly with code 0 (was 1, which made relaunch wrappers treat it as a crash).
  • Paired with the vendored-runtime change (fix(tauri-runtime-cef): exit cleanly instead of panic when cef::initialize fails (TAURI-RUST-F) tauri-cef#17) which converts the fatal cef::initialize assert into a graceful dialog + clean exit — together they make the assertion left == right panic impossible on every platform, regardless of cause.

Problem

cef::initialize() returns 1 on success, 0 on failure. The vendored tauri-runtime-cef asserted == 1, so a failure became a fatal panic: assertion left == right failed, left: 0, right: 1 (Sentry TAURI-RUST-F). The prior Windows-only fix (#3210) covered the Windows relaunch race, but the panic still recurs on the post-fix release 0.57.13 across Windows + Linux + macOS — because:

  • macOS/Linux used a one-shot SingletonLock check that exit(1)s on the first collision (no wait for a dying prior instance), and only catches the lock case; a lock freed-then-reacquired race, or a non-lock cause (GPU/sandbox/permission), falls straight through to the assert.
  • The assert itself was never removed — it is the single chokepoint every platform funnels through.

Solution

  • app/src-tauri/src/cef_preflight.rs: add wait_for_cache_release() — polls check_default_cache() with exponential backoff (100→500ms, capped) up to a 5s budget. Lock clears → proceed; still held after budget → print the actionable message and exit(0); cache path unresolvable → proceed best-effort. Pure backoff_delay is unit-tested on-host.
  • app/src-tauri/src/lib.rs: swap the check_default_cache() + exit(1) block for cef_preflight::wait_for_cache_release().
  • Submodule bump (landed — c532a2d96): vendor/tauri-cef bumped to feat/cef tip 5ec3d883 = merged fix(cef): import gtk prelude for Linux dialog (TAURI-RUST-F) tauri-cef#18, which carries Memory Layer Only #17 (runtime assert→graceful-exit + native dialog) plus its Linux gtk-prelude + macOS NSAlert-unsafe build fixes. The universal backstop for any failure that still reaches cef::initialize on any platform. .github/tauri-cef-expected-sha pin updated to match.

Design: the benign "another instance is running" give-up stays terminal-only (no GUI nag on transient relaunch races); the GUI dialog (runtime guard) fires only on a genuine unrecoverable init failure.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy — 2 backoff unit tests; happy path + held-lock give-up verified end-to-end on macOS (below).
  • N/A: Diff coverage ≥ 80%backoff_delay is covered by host unit tests; the wait loop + process::exit + live-PID lock detection are process/platform-dependent and not instrumentable on the coverage runner (same constraint as fix(shell): Windows pre-CEF cache-lock wait to stop relaunch panic (TAURI-RUST-F) #3210). Validated by manual repro instead.
  • N/A: behaviour-only platform-guard change — no coverage-matrix feature row added/removed/renamed.
  • N/A: no coverage-matrix feature IDs affected.
  • No new external network dependencies introduced.
  • N/A: no release-cut smoke surface changed — pre-CEF startup guard only; no user-facing flow beyond a rare error-path message.
  • N/A: Sentry-only issue, no GitHub issue to close — Sentry-Issue: TAURI-RUST-F (see Related).

Impact

  • Desktop macOS + Linux startup only (Windows path unchanged — already covered by fix(shell): Windows pre-CEF cache-lock wait to stop relaunch panic (TAURI-RUST-F) #3210). Zero cost on the common path (lock free → proceed immediately); a ≤5s bounded wait only inside the rare relaunch-race window, then proceed or clean-exit(0).
  • Removes a fatal startup crash; no behaviour change to a normal launch (verified — boots to RunEvent::Ready).
  • exit(1)exit(0) on give-up so auto-update/relaunch wrappers no longer see a crash.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A — Sentry-only (TAURI-RUST-F)

Commit & Branch

  • Branch: fix/tauri-rust-f-cef-init-graceful-exit
  • Commit SHA: c532a2d (submodule bump to tauri-cef#18 5ec3d883 + pin update)

Validation Run

  • pnpm --filter openhuman-app format:check — Rust fmt clean on changed files
  • N/A: no TypeScript changed — pnpm typecheck not applicable
  • Focused tests: cargo test --lib cef_preflight → 13 passed (incl. 2 new backoff tests)
  • Rust fmt/check (if changed): cargo fmt --check + cargo check clean (shell)
  • Tauri fmt/check (if changed): cargo clippy -p tauri-runtime-cef clean (new module: zero warnings)

Validation Blocked

  • command: live CEF-init-failure (non-lock) repro on macOS
  • error: cannot deterministically force cef::initialize to fail for a non-lock reason on a dev host
  • impact: low — covered by the lock-held repro (passed) + the runtime guard's catch_unwind safety; Windows/Linux glue via CI build matrix

Behavior Changes

  • Intended behavior change: relaunch-race second instance waits (≤5s) for the prior instance to release the CEF cache, then proceeds or exits cleanly (code 0) — instead of panicking or exit(1).
  • User-visible effect: none on a normal launch; on a stuck relaunch race the app exits cleanly with a terminal message instead of crashing.

Summary by CodeRabbit

  • Bug Fixes

    • Improved CEF cache lock handling on macOS and Linux: the app now waits with bounded exponential backoff for cache locks, logs and returns early if user home can’t be resolved, and exits gracefully with a clear, platform-appropriate message if the lock persists.
  • Tests

    • Added unit tests to verify backoff timing, caps, and safe saturation.
  • Chores

    • Updated vendored CEF reference.

…RUST-F)

The macOS/Linux CEF cache-lock preflight exited(1) the moment it saw a
live lock-holder. But the dominant TAURI-RUST-F cause is a sequential
relaunch race: the prior instance is mid-teardown and releases the lock
within a second. Replace the one-shot check with a bounded wait
(exponential backoff, 5s budget) — the macOS/Linux analogue of the
existing Windows pre-CEF wait — so the relaunch resolves seamlessly
instead of being killed. If the lock is still held after the budget,
exit cleanly with code 0 (was 1, which made relaunch wrappers treat it
as a crash).

Defense-in-depth: any failure that still reaches cef::initialize (lock
freed then re-acquired, or a GPU/sandbox/permission failure) is handled
by the vendored runtime's cef_init_guard, which shows a dialog and exits
cleanly. Together they make the 'assertion left == right' panic
impossible regardless of cause.

Verified on macOS: a second instance launched against a live
lock-holder waits with backoff for 5s then exits 0 with an actionable
message and no panic; the happy path (lock free) boots normally.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a bounded exponential-backoff wait for CEF cache-locks on macOS/Linux (pub fn wait_for_cache_release), updates held-lock messaging, integrates the wait into app startup, adds backoff unit tests, and bumps the vendored tauri-cef SHA.

Changes

CEF Cache-Lock Wait Flow

Layer / File(s) Summary
Timing constants, wait logic, display update, and tests
app/src-tauri/src/cef_preflight.rs
Adds WAIT_BUDGET, BACKOFF_BASE, BACKOFF_CAP, implements private backoff_delay(attempt) and public wait_for_cache_release() that polls check_default_cache() with capped exponential backoff (returns early on NoHomeDir, prints held-lock message and exits(0) after budget), updates CefLockError::Held display text for macOS vs Linux, and unit tests for backoff progression, clamping, and saturation.
Startup preflight integration
app/src-tauri/src/lib.rs
Replaces the immediate check_default_cache() call with wait_for_cache_release() in the macOS/Linux CEF preflight during app startup.
Vendored dependency and expected SHA bump
app/src-tauri/vendor/tauri-cef, .github/tauri-cef-expected-sha
Updates the vendored tauri-cef subproject commit reference and the recorded expected SHA.
sequenceDiagram
  participant AppStartup
  participant CefPreflight as cef_preflight::wait_for_cache_release
  participant CacheChecker as check_default_cache
  participant Process as ProcessExit

  AppStartup->>CefPreflight: call wait_for_cache_release()
  CefPreflight->>CacheChecker: invoke check_default_cache()
  CacheChecker-->>CefPreflight: LockHeld / NoLock / NoHomeDir
  alt NoHomeDir
    CefPreflight-->>AppStartup: return early
  else LockHeld and within budget
    CefPreflight->>CefPreflight: sleep backoff_delay(attempt)
    CefPreflight->>CacheChecker: recheck
  else LockHeld and budget exhausted
    CefPreflight->>Process: print CefLockError and exit(0)
  else NoLock
    CefPreflight-->>AppStartup: return and continue startup
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A lock upon the cache so tight,
We wait with backoff, soft and light,
Attempts grow slow, then gently cap,
If time runs out we close the gap,
Then hush—the rabbit naps in app.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title directly matches the main changeset: adding a bounded wait for CEF cache-lock release on macOS/Linux with graceful init failure handling, replacing the previous one-shot check that exited with code 1.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@oxoxDev oxoxDev marked this pull request as ready for review June 5, 2026 06:56
@oxoxDev oxoxDev requested a review from a team June 5, 2026 06:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src-tauri/src/cef_preflight.rs`:
- Around line 294-295: The message printed for CefLockError::Held is
macOS-specific but is now reached on Linux; change the stderr output so it is
platform-aware or neutral: when handling CefLockError::Held (the variable `held`
in cef_preflight.rs) print a macOS-specific pkill suggestion only if
cfg!(target_os = "macos"), otherwise print a Linux-appropriate suggestion or a
generic “ensure previous OpenHuman instance is terminated” message; update the
eprintln! call that currently prints `held` verbatim to perform this conditional
formatting.
- Around line 289-304: The final sleep call in the CEF cache wait loop can
exceed the WAIT_BUDGET because the delay is not clamped to the remaining budget
time. After checking that start.elapsed() is less than WAIT_BUDGET, calculate
the remaining time available in the budget by subtracting start.elapsed() from
WAIT_BUDGET, then clamp the delay returned by backoff_delay(attempt) to not
exceed this remaining time before calling std::thread::sleep. This ensures the
total elapsed time never exceeds the documented WAIT_BUDGET contract.

In `@app/src-tauri/vendor/tauri-cef`:
- Line 1: Update the .github/tauri-cef-expected-sha file to match the new
tauri-cef submodule commit by replacing its contents with
8ea806ed1b444a2260b80f5cb4ec8e0dbdb51965; alternatively, if you prefer not to
change the guard, revert the tauri-cef submodule bump so the existing expected
SHA and gitlink remain in sync (ensure the change is committed in the same PR).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 63b2c7aa-735f-4128-82ba-96b6c21b25d5

📥 Commits

Reviewing files that changed from the base of the PR and between 87a91ae and 4981a16.

📒 Files selected for processing (3)
  • app/src-tauri/src/cef_preflight.rs
  • app/src-tauri/src/lib.rs
  • app/src-tauri/vendor/tauri-cef

Comment thread app/src-tauri/src/cef_preflight.rs Outdated
Comment thread app/src-tauri/src/cef_preflight.rs
Comment thread app/src-tauri/vendor/tauri-cef Outdated
- clamp backoff to remaining WAIT_BUDGET so total wait never exceeds the
  documented 5s contract by up to BACKOFF_CAP (CodeRabbit)
- make CefLockError::Held workaround text platform-aware; macOS .app pkill
  pattern no longer shown on Linux (cef_preflight compiles for both)
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 5, 2026
…tk prelude + macOS NSAlert unsafe (TAURI-RUST-F)
@oxoxDev
Copy link
Copy Markdown
Contributor Author

oxoxDev commented Jun 5, 2026

Manual validation — macOS, debug .app, vendored tauri-cef 5ec3d883 (#18)

Built the packaged bundle from this branch and exercised the preflight on macOS (Apple Silicon). Cache path resolved via the OPENHUMAN_CEF_CACHE_PATH env-first branch.

Test Scenario Observed Result
T1 Normal launch (no lock) preflight no SingletonLock → boots, no crash
T2a 2nd instance while 1st holds a live lock detected live holder; backoff 100→200→400→500→500ms; waited the full 5s budget → process exits with code 0, no panic
T2b Kill the holder mid-wait (relaunch race) waited, holder gone, poll 5 (~1.7s) → CEF cache lock released … proceeding to CEF init → boots

Confirmed: no assertion left == right panic on any path. The exit(1)exit(0) change verified end-to-end (T2a give-up returns 0, so auto-update/relaunch wrappers no longer treat it as a crash). The dominant TAURI-RUST-F cause — a sequential relaunch race — resolves seamlessly (T2b), and the live exponential backoff matches the unit-tested schedule.

Not covered locally: the Linux gtk-prelude path (#18) is CI-only; the genuine cef::initialize→0 NSAlert dialog (runtime guard from tauri-cef#17/#18) is exercised by unit/CI rather than forced live.

@senamakel senamakel merged commit ecad065 into tinyhumansai:main Jun 5, 2026
22 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants