fix(shell): orderly shutdown + CEF child-webview prewarm#925
fix(shell): orderly shutdown + CEF child-webview prewarm#925senamakel merged 8 commits intotinyhumansai:mainfrom
Conversation
Closes tinyhumansai#920. Two related changes for first-paint speed and clean quit: **Clean shutdown.** `RunEvent::Exit` only awaited `core.shutdown()`; CDP session tasks, load watchdogs, the `webview_apis` accept loop and open `acct_*` child webviews were left to drop with the tokio runtime. On macOS that race is a strong candidate for the "abnormal exit" the OS reports against the app process. Now Exit: 1. tears down the CEF prewarm webview, 2. calls `WebviewAccountsState::shutdown_all` — aborts every `cdp_sessions` / `load_watchdogs` task, closes every `acct_*` child webview so CEF browsers tear down before `cef::shutdown()`, and unregisters CEF notification handlers, 3. calls `webview_apis::server::stop()` to abort the accept loop and release the loopback port, 4. then `core.shutdown()` (unchanged). **CEF prewarm.** Restores the cold-start warmup that was disabled earlier — a 1×1 hidden child webview at `about:blank` parked far off-screen on the main window, spawned 300 ms after setup. CEF's child render path is then hot when the user clicks an account, so the first real `webview_account_open` skips the cold renderer-process spinup. Gated by `OPENHUMAN_CEF_PREWARM` (default on; set to `0`/`false`/ `no`/`off` to disable) so the prior "blank-webview on first onboarding open" regression is one env var away from being mitigated in the field. Torn down in the same Exit sequence above.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughHard‑wires CEF as the sole runtime, makes CEF startup/profile setup unconditional, adds a hidden 1×1 CEF prewarm webview spawned on startup and closed on exit, centralizes per-account cleanup via Changes
Sequence Diagram(s)sequenceDiagram
participant Runtime as Runtime (Tauri)
participant Prewarm as CEF prewarm webview
participant Accounts as WebviewAccountsState
participant Server as WebviewAPIs server
participant Core as Core process
Runtime->>Prewarm: spawn hidden 1×1 child (about:blank) after delay
Prewarm-->>Runtime: prewarm created
Runtime->>Runtime: RunEvent::Exit
Runtime->>Prewarm: close prewarm webview
Prewarm-->>Runtime: closed
Runtime->>Accounts: call shutdown_all(app)
Accounts->>Accounts: abort CDP sessions & watchdogs
Accounts->>Accounts: unregister CEF notification handlers
Accounts->>Runtime: close tracked child webviews via app API
Accounts-->>Runtime: accounts shutdown complete
Runtime->>Server: call webview_apis::stop()
Server->>Server: take & abort accept-loop task
Server-->>Runtime: server stopped
Runtime->>Core: proceed with core shutdown
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly Related PRs
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
app/src-tauri/src/webview_accounts/mod.rs (1)
495-500: Consider clearingnotification_bypassfor completeness.The
loaded_accountsandrequested_boundsare cleared, butnotification_bypassretains its state. While this is unlikely to cause issues (it's just preferences), clearing it would makeshutdown_alla complete state reset matching the "drain-based so repeated calls do not leave prior handles intact" doc comment.♻️ Optional: reset notification_bypass to default
if let Ok(mut g) = self.requested_bounds.lock() { g.clear(); } + if let Ok(mut g) = self.notification_bypass.lock() { + *g = NotificationBypassPrefs::default(); + } log::info!("[webview-accounts] shutdown_all complete");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src-tauri/src/webview_accounts/mod.rs` around lines 495 - 500, The shutdown_all implementation clears loaded_accounts and requested_bounds but does not reset notification_bypass; update shutdown_all to also reset notification_bypass to its default state (e.g., clear or set to false/None per its type) so the shutdown fully drains state. Locate the shutdown_all function and add logic similar to the existing mutex-guarded clears (use self.notification_bypass.lock() and set/clear its inner value) ensuring you handle the Result from lock() the same way as loaded_accounts and requested_bounds to avoid panics and preserve behavior on repeated calls.app/src-tauri/src/webview_apis/server.rs (1)
94-97: Consider logging or warning on mutex lock failure.If
slot.lock()fails (poisoned mutex), the handle is silently not stored, which could lead to a port leak on shutdown. While mutex poisoning is rare, a warning would aid debugging.🔧 Optional: log on lock failure
let slot = ACCEPT_LOOP.get_or_init(|| Mutex::new(None)); - if let Ok(mut g) = slot.lock() { - *g = Some(accept_handle); + match slot.lock() { + Ok(mut g) => *g = Some(accept_handle), + Err(e) => log::warn!("[webview_apis] failed to store accept loop handle: {e}"), }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src-tauri/src/webview_apis/server.rs` around lines 94 - 97, The current code silently ignores a poisoned mutex when calling ACCEPT_LOOP.get_or_init(...); change the slot.lock() handling to log a warning on Err and still attempt to store accept_handle on a poisoned lock: replace the if let Ok(mut g) = slot.lock() { *g = Some(accept_handle); } with a match (or equivalent) that on Ok sets *g = Some(accept_handle) and on Err logs a warning (using the crate logger/tracing) and uses the poisoned guard's into_inner() to set the handle so the accept_handle is not lost; reference ACCEPT_LOOP, slot.lock(), and accept_handle when making the change.app/src-tauri/src/lib.rs (1)
755-765: Teardown returnsErrfor missing webview — verify this is intentional.
teardown_cef_prewarmreturnsErr("no prewarm webview")when the webview doesn't exist (e.g., if prewarm was disabled via env var or never spawned). The caller at line 1427 logs this atdebuglevel, which is appropriate since it's expected in disabled configurations. However, returningResultfor an expected condition is slightly unconventional.If preferred, the function could return
Ok(bool)to distinguish "closed" vs "nothing to close":♻️ Alternative: return Ok(bool) instead of Err for missing webview
#[cfg(feature = "cef")] -fn teardown_cef_prewarm<R: tauri::Runtime>(app: &AppHandle<R>) -> Result<(), String> { +fn teardown_cef_prewarm<R: tauri::Runtime>(app: &AppHandle<R>) -> Result<bool, String> { let Some(wv) = app.get_webview(CEF_PREWARM_LABEL) else { - return Err("no prewarm webview".into()); + return Ok(false); }; wv.close().map_err(|e| e.to_string())?; log::info!("[cef-prewarm] teardown ok"); - Ok(()) + Ok(true) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src-tauri/src/lib.rs` around lines 755 - 765, teardown_cef_prewarm currently returns Err when the prewarm webview is missing; change its signature from fn teardown_cef_prewarm<R: tauri::Runtime>(app: &AppHandle<R>) -> Result<(), String> to return Result<bool, String> (or simply Result<bool, ()>) and make it return Ok(false) if app.get_webview(CEF_PREWARM_LABEL) is None and Ok(true) after a successful close, leaving Err only for genuine close errors; then update the caller(s) that invoked teardown_cef_prewarm to treat Ok(false) as the expected "nothing to close" case (log at debug) and handle Ok(true) / Err as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/src-tauri/src/lib.rs`:
- Around line 755-765: teardown_cef_prewarm currently returns Err when the
prewarm webview is missing; change its signature from fn teardown_cef_prewarm<R:
tauri::Runtime>(app: &AppHandle<R>) -> Result<(), String> to return Result<bool,
String> (or simply Result<bool, ()>) and make it return Ok(false) if
app.get_webview(CEF_PREWARM_LABEL) is None and Ok(true) after a successful
close, leaving Err only for genuine close errors; then update the caller(s) that
invoked teardown_cef_prewarm to treat Ok(false) as the expected "nothing to
close" case (log at debug) and handle Ok(true) / Err as before.
In `@app/src-tauri/src/webview_accounts/mod.rs`:
- Around line 495-500: The shutdown_all implementation clears loaded_accounts
and requested_bounds but does not reset notification_bypass; update shutdown_all
to also reset notification_bypass to its default state (e.g., clear or set to
false/None per its type) so the shutdown fully drains state. Locate the
shutdown_all function and add logic similar to the existing mutex-guarded clears
(use self.notification_bypass.lock() and set/clear its inner value) ensuring you
handle the Result from lock() the same way as loaded_accounts and
requested_bounds to avoid panics and preserve behavior on repeated calls.
In `@app/src-tauri/src/webview_apis/server.rs`:
- Around line 94-97: The current code silently ignores a poisoned mutex when
calling ACCEPT_LOOP.get_or_init(...); change the slot.lock() handling to log a
warning on Err and still attempt to store accept_handle on a poisoned lock:
replace the if let Ok(mut g) = slot.lock() { *g = Some(accept_handle); } with a
match (or equivalent) that on Ok sets *g = Some(accept_handle) and on Err logs a
warning (using the crate logger/tracing) and uses the poisoned guard's
into_inner() to set the handle so the accept_handle is not lost; reference
ACCEPT_LOOP, slot.lock(), and accept_handle when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 352dd6e6-5279-4398-bc14-749e9f5c7868
⛔ Files ignored due to path filters (1)
app/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
app/src-tauri/src/lib.rsapp/src-tauri/src/webview_accounts/mod.rsapp/src-tauri/src/webview_apis/server.rs
The Tauri shell has been shipping CEF as the production runtime for a while; the vendored tauri-cli only knows how to bundle CEF, and there is no longer a path that builds and runs against wry/WKWebView. Carrying both a `cef` and a `wry` cargo feature meant ~90 cfg gates across `lib.rs`, `webview_accounts/`, the scanners, `gmail/`, and `main.rs`, plus a non-cef branch in every CDP-touching code path. All of that is dead — when the user tries `pnpm dev:app` without explicit features, default = ["cef"] kicks in anyway. Drop wry support entirely: - `app/src-tauri/Cargo.toml`: remove the `wry` feature, fold the `cef` feature into unconditional dependencies (`tauri-runtime-cef`, `cef`), pin `tauri` features to include `cef` + `devtools` always. - Strip every `#[cfg(feature = "cef")]` / `#[cfg(not(feature = "cef"))]` gate (~90 across 5 source files); collapse dual-runtime branches to the cef path; delete the `Wry` `AppRuntime` alias and the wry `tauri::Builder::<tauri::Wry>::new()` branch. - Drop the `cfg_attr` on `tauri::cef_entry_point` in `main.rs`. - Update playbooks / command docs that referenced the wry fallback. No behavior change for production builds — they were already CEF-only.
`pnpm dev:app` was loading the production backend URL even though the repo-root `.env` had `VITE_BACKEND_URL=https://staging-api.tinyhumans.ai` exported. Vite only hydrates `import.meta.env` from `.env*` files in its `envDir` (defaulting to `root`, which is `app/src/` here), so the shell-exported VITE_* vars from `scripts/load-dotenv.sh` were invisible to the frontend, and `app/src/utils/config.ts` fell through to its production default in `BACKEND_URL`. Set `envDir` to the repo root in `app/vite.config.ts` so Vite reads the same `.env` the Rust shell does — `VITE_BACKEND_URL` and `VITE_OPENHUMAN_APP_ENV` from the root file now reach the browser bundle without needing a per-clone `app/.env.local`.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
app/vite.config.ts (1)
59-67: Align env documentation/workflow with new root-levelenvDir.This change is sensible, but it shifts frontend env discovery from
app/.env.localconventions to the repo-root.env. Please update the related docs/examples/scripts so contributors don’t silently set vars in the wrong location.Based on learnings: “Load environment variables via
source scripts/load-dotenv.sh; use.env.example... andapp/.env.example(VITE_*frontend variables)”, plus “Read allVITE_*environment variables throughapp/src/utils/config.tsand re-export them.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/vite.config.ts` around lines 59 - 67, Updating envDir to the repo root changes where VITE_* vars are loaded; update docs/scripts and example files so contributors know to put frontend vars at repo root and how to load them. Update README/developer docs to mention using scripts/load-dotenv.sh and the new repo-root .env (and keep .env.example and app/.env.example in sync), add a short note in the app dev workflow referencing envDir: resolve(__dirname, "..") and explain that app/src/utils/config.ts reads/re-exports VITE_* vars for the frontend, and update any dev scripts (e.g., pnpm dev:app) and examples to source the repo-root .env accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/vite.config.ts`:
- Around line 59-67: Updating envDir to the repo root changes where VITE_* vars
are loaded; update docs/scripts and example files so contributors know to put
frontend vars at repo root and how to load them. Update README/developer docs to
mention using scripts/load-dotenv.sh and the new repo-root .env (and keep
.env.example and app/.env.example in sync), add a short note in the app dev
workflow referencing envDir: resolve(__dirname, "..") and explain that
app/src/utils/config.ts reads/re-exports VITE_* vars for the frontend, and
update any dev scripts (e.g., pnpm dev:app) and examples to source the repo-root
.env accordingly.
The wry-removal commit set `default = ["custom-protocol"]` after misreading the original "DO NOT REMOVE!!" comment. `custom-protocol` is the flag Tauri uses in *release* builds to swap `devUrl` (vite dev server) for the bundled `frontendDist` (`tauri://localhost`) — putting it in default makes every `pnpm dev:app` silently load the cached production bundle from `app/dist/`, which is also why the dev app was hitting prod APIs (the bundled JS was built against the prod backend). `cargo tauri build` adds `--features custom-protocol` automatically for release, so the feature still does its job in prod without being in the default set.
Three layers of UA hacks were in place from the wry/WKWebView era,
each pinning an *older* Chrome version than CEF actually is:
- `builder.user_agent(CHROME_UA)` pinned 124 (CEF native: 146)
- `Emulation.setUserAgentOverride` pinned 136
- `ua_spoof.js` init script pinned 124 + Client Hints stub
CEF 146 advertises `Chrome/146.0.0.0` natively at the network layer
*and* in `navigator.userAgent` / `navigator.userAgentData`, so every
override was a regression. Drop all three.
The notification-permission shim that lived inside `ua_spoof.js`
(Notification.permission='granted', PushManager.permissionState,
navigator.permissions.query) is the only load-bearing piece — without
it Slack and Gmail show in-app "enable notifications" banners. That
shim already runs for every provider via `cdp/session.rs`, which
injects it through `Page.addScriptToEvaluateOnNewDocument` against
the placeholder URL before the real provider URL loads. So the JS
init-script copy is pure duplication.
Removed:
- `CHROME_UA`, `provider_user_agent`, `builder.user_agent(...)`
- `UA_SPOOF_JS`, `provider_ua_spoof`, the `spoof` block in
`build_init_script`, the `ua_spoof.js` file
- `cdp::emulation` module (`UaSpec`, `set_user_agent_override`)
and the override call in `cdp::session::run_session_cycle`
- obsolete unit tests (`zoom_registered_in_user_agent`,
`zoom_enables_ua_spoof`)
Net: -47 lines in `webview_accounts/mod.rs`, -7.1 KB of injected JS,
-100 lines of CDP UA plumbing. Brings `webview_accounts/` closer to
CLAUDE.md's "no JS injection into CEF child webviews" rule — only
the recipe.js for the 3 unmigrated providers (gmail, linkedin,
google-meet) remains as legacy injection.
The macOS Cmd+Q panic was a race between our `RunEvent::Exit` handler
and CEF's own runtime-level Exit handler. CEF's handler does:
close_all_windows() → message_loop_pump → cef::shutdown()
with a CHECK that all browsers are closed before `cef::shutdown()`.
Our Exit handler ran in the same phase, so our `wv.close()` calls
issued async close requests that were still in-flight when CEF
counted browsers, tripping the `browser_count == 0` assertion.
Also, our `Exit` handler did `block_on(core.shutdown())` which waits
up to 12 s for the core sidecar to exit gracefully. RunEvent is
delivered on the main thread, which on macOS is also CEF's UI
thread — so we were blocking the very loop CEF needed to pump for
its own teardown.
Fix:
1. Move all shell teardown to `RunEvent::ExitRequested`. That fires
before `Exit`, so our `wv.close()` requests propagate during the
message-pump phase and CEF sees zero live browsers when it gets
to `cef::shutdown()`.
2. Replace `core.shutdown()` (blocks on `child.wait`) with a new
`core.send_terminate_signal()` that just sends SIGTERM (Unix) /
`start_kill` (Windows) and returns immediately. The child runs
its own SIGTERM handler in parallel; the kernel reaps it after
Tauri exits. Core's lifecycle no longer gates Tauri's shutdown.
3. `RunEvent::Exit` becomes log-only.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src-tauri/src/webview_accounts/mod.rs`:
- Around line 390-466: shutdown_all currently aborts cdp_sessions/watchdogs and
closes webviews but does not teardown per-account scanners; move the scanner
"forget" logic (the code you used in the account close/purge paths that calls
the per-account scanner unregister/forget) into a shared helper and invoke it
from shutdown_all. Concretely, add a helper (e.g., fn
teardown_account_scanners(&self, acct: &str)) that drains the scanner registries
and calls the same forget/unregister for WhatsApp/Slack/Discord/Telegram scanner
entries, then call that helper inside shutdown_all alongside the existing drains
for cdp_sessions, load_watchdogs, and browser_ids; also refactor the existing
close/purge code to call this helper so all exit/close/purge paths reuse the
same teardown logic. Ensure you reference the same registry fields used
elsewhere (the per-account scanner registries) and keep behavior identical to
the current forget calls.
- Around line 1149-1153: The comment in webview_accounts::mod.rs still mentions
the removed UA-override flow and must be updated to describe the current
behavior: explain that the placeholder webview is opened so the CDP session can
attach and inject the notification shim (and that
OPENHUMAN_DISABLE_SLACK_SCANNER=1 still skips the long-lived CDP session for
external DevTools), removing any mention of UA overrides or UA-specific
subsystems; also update the matching docs (AGENTS.md or architecture docs) to
reflect this changed behavior so rustdoc/comments and external docs remain
consistent.
- Around line 390-399: Add a unit/integration test that exercises
WebviewAccountsState::shutdown_all: construct a test WebviewAccountsState
populated with representative per-account resources (stored abort handles for
CDP session tasks, watchdog handles, registered notification handlers, and dummy
acct_* child-webview entries), call shutdown_all with a test AppHandle/Runtime,
and assert that (1) all abort handles were aborted, (2) watchdogs and CEF
notification handlers were unregistered/cleared, (3) acct_* child webviews are
closed/drained and all internal collections are empty afterwards, and (4) a
second call to shutdown_all is a safe no-op (no panics and collections remain
empty); use lightweight mocks or fakes for abort handles and webview-close
operations to observe calls rather than real CEF interaction and place the test
alongside WebviewAccountsState so it runs in CI.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c1be553f-ee0a-4897-9bc0-26ddd6bed370
📒 Files selected for processing (5)
app/src-tauri/src/cdp/emulation.rsapp/src-tauri/src/cdp/mod.rsapp/src-tauri/src/cdp/session.rsapp/src-tauri/src/webview_accounts/mod.rsapp/src-tauri/src/webview_accounts/ua_spoof.js
💤 Files with no reviewable changes (2)
- app/src-tauri/src/cdp/emulation.rs
- app/src-tauri/src/webview_accounts/ua_spoof.js
refactor(webview_accounts): improve shutdown process by separating resource draining and webview closing - Introduced `drain_for_shutdown` method to handle resource cleanup and background task abortion. - Updated `shutdown_all` to utilize the new method and ensure proper webview closure. - Enhanced documentation for clarity on shutdown behavior and resource management.
…ck calls - Reformatted lock calls for `browser_ids` and `loaded_accounts` to enhance code clarity and maintainability. - No functional changes were made; this is purely a cosmetic improvement to the code structure.
Summary
webview_apisaccept loop before the tokio runtime drops — closes [Bug] [P1] App quit does not clean up; macOS reports an error #920 (macOS reporting OpenHuman quit as abnormal).about:blankchild webview on the main window, gated byOPENHUMAN_CEF_PREWARM(default on) so first-account-open skips the cold renderer-process spinup.What changed
RunEvent::Exitnow runs in this order:teardown_cef_prewarm(closes the warmup webview if alive).WebviewAccountsState::shutdown_all— aborts every entry incdp_sessionsandload_watchdogs, closes everyacct_*child webview so the CEF browser tears down beforecef::shutdown()runs, and unregisters CEF notification handlers (browser_ids).webview_apis::server::stop()— aborts the WebSocket accept loop and releases the loopback port.core.shutdown()— unchanged.Previously only step 4 ran, so per-account tokio tasks were holding
AppHandleclones past runtime drop and the loopback listener stayed bound until the kernel cleaned up — both showed up as macOS "abnormal exit" diagnostics.CEF prewarm. Earlier builds had this disabled while triaging a "blank webview on first onboarding open" report — the file comment is replaced with a working implementation that:
about:blankon the main window, parked at(-20000, -20000)so it never paints over the real UI,OPENHUMAN_CEF_PREWARM(default on; set to0/false/no/offto disable in the field if the prior regression returns),Test plan
pnpm tauri dev(CEF), open an account webview, Cmd+Q. Confirm Console.app no longer surfaces an OpenHuman abnormal-exit report andopenhumansidecar is gone fromps.OPENHUMAN_CEF_PREWARM=0to confirm the kill switch.webview-account:loadarrives faster than baseline (eyeball; later add a timing log if useful).webview_apisport).cargo check --manifest-path app/src-tauri/Cargo.tomland--features cefboth green.Summary by CodeRabbit
New Features
Bug Fixes
Behavior Changes
Documentation