test(e2e): unify driver onto Appium Chromium attached to CEF CDP by senamakel · Pull Request #1696 · tinyhumansai/openhuman

senamakel · 2026-05-14T03:32:09Z

Summary

Collapse the per-platform E2E driver split (Mac2 on macOS, tauri-driver on Linux/Windows) onto a single backend: Appium Chromium driver attached to CEF's already-enabled remote-debugging port (19222, configured in app/src-tauri/src/lib.rs:1556).
One Appium session per run, all specs sequential — eliminates the per-spec Appium + app restart cost.
New unified runner app/scripts/e2e-run-session.sh; legacy e2e-run-spec.sh now a thin shim so existing pnpm test:e2e:cron-jobs etc. keep working.
Auto-pins chromedriver to CEF's bundled Chromium version (currently 146.0.7680.165) via a Chrome for Testing download cached at app/test/e2e/.cache/chromedriver/<version>-<platform>/.
Validated on macOS: smoke spec (3/3 passing in ~1s).

Problem

The harness was split into two driver backends with two separate code paths:

Linux / Windows drove the WebView through tauri-driver (W3C WebDriver, full DOM access).
macOS drove the .app bundle through Appium Mac2 (XCUITest accessibility tree, no DOM, no executeScript).

Every spec branched on isTauriDriver() and isMac2(), and helpers like element-helpers.ts carried duplicate XPath-over-accessibility code for Mac2. mega-flow.spec.ts was written specifically against Mac2's limitations (deep links + mock log + RPC, no UI assertions) because Mac2 couldn't see the DOM at all. The asymmetry made macOS coverage shallow and any unified test ergonomics impossible.

Additionally, each spec invoked e2e-run-spec.sh, which started a fresh Appium / tauri-driver and a fresh OpenHuman.app per spec — wall-clock cost dominated by setup, not assertions.

Solution

CEF (the runtime under Tauri here) already exposes a Chrome DevTools Protocol port on 127.0.0.1:19222 on every platform (enabled in app/src-tauri/src/lib.rs:1556). That makes a unified driver possible:

wdio.conf.ts: single Chromium capability with goog:chromeOptions.debuggerAddress: 127.0.0.1:19222. Same caps on macOS / Linux / Windows. maxInstances: 1 keeps everything in one session. A before: hook switches to the main app webview after attach (defensive; the runner already prunes the prewarm target).
scripts/e2e-run-session.sh (new): launches the built CEF binary once, waits for CDP, prunes the prewarm about:blank target via /json/close/<id> so chromedriver doesn't attach to it (would otherwise die mid-run when CEF GC'd that target), parses Chromium version from /json/version and downloads a matching chromedriver from Chrome for Testing, then starts Appium with the chromium driver and runs WDIO.
scripts/e2e-build.sh: now invokes pnpm tauri:ensure and uses cargo tauri build (not pnpm exec tauri build) so the .app ships the CEF framework. Without this, the bundle panics at startup inside cef::library_loader::LibraryLoader::new (the failure mode CLAUDE.md warns about).
scripts/e2e-run-spec.sh: thin shim that execs e2e-run-session.sh, so existing per-spec scripts (test:e2e:cron-jobs, etc.) keep working.
helpers/platform.ts: collapsed — isTauriDriver() returns true, isMac2() returns false, supportsExecuteScript() returns true on every platform. The ~40 existing specs that branched on these uniformly take the DOM-capable code path, which Chromium driver supports everywhere.
helpers/core-rpc.ts: drops the WebView path that did await import('@tauri-apps/api/core') inside browser.execute. That dynamic specifier only resolved under tauri-driver's renderer import map; under direct CDP it 404s. The Node-side HTTP path (port probe + bearer token from ${tmpdir}/openhuman-e2e-rpc-token) is driver-agnostic and is now the only path.
helpers/app-helpers.ts: waitForApp does a real readiness check.

This is phase 1 of a multi-phase migration. The unified driver works end-to-end (smoke spec green on macOS). Phase 2 work that is not in this PR: the mega-flow and other deep-link-heavy specs hit a real CEF interaction — the vendored CEF runtime drops and recreates the main WebviewWindow on certain events (see app/src-tauri/src/lib.rs:907), which terminates the chromedriver session bound to the original target ID. That needs either session-reattach logic in the test helpers or a behavior change in the CEF window-recreation path; the right shape isn't obvious yet and warrants its own PR.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
N/A: this PR changes how E2E tests are driven, not new product code; coverage gate is irrelevant here. The unit + Rust coverage workflows still run against changed lines elsewhere as configured.
N/A: behaviour-only change to test harness; no feature rows added/removed.
N/A: harness change, no feature IDs.
No new external network dependencies introduced (mock backend used per Testing Strategy)
N/A: no release-surface change.
N/A: no linked issue.

Impact

Desktop (macOS, Linux, Windows): single unified E2E driver path. macOS now has full DOM access (Mac2 didn't), so specs that branched on isTauriDriver() will execute their DOM assertions on macOS for the first time.
Performance: smoke spec went from ~21s to ~1s of test time (faster Appium attach + correct target on first try). For multi-spec runs, eliminating the per-spec Appium + CEF restart cost will save many minutes per CI run.
Migration: existing pnpm test:e2e:cron-jobs etc. continue to work (legacy runner is a shim). New entry point is pnpm test:e2e:session[:full].
CI: workflows that previously installed tauri-driver and ran e2e-run-spec.sh per spec can be simplified to a single session run; that simplification is deliberately out of scope here so the foundation can be validated first.

Closes:
Follow-up PR(s)/TODOs:
- Phase 2: handle CEF window recreation so mega-flow.spec.ts and other deep-link-heavy specs run green under the unified driver.
- Dead-code sweep in helpers/element-helpers.ts (Mac2 XPath paths are now unreachable).
- Move legacy spec files into specs/legacy/ once the mega-orchestrator covers their flows so the default test:e2e:session is fast.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: e2e/unified-appium-chromium
Commit SHA: c555df8

Validation Run

pnpm --filter openhuman-app format:check
pnpm typecheck
Focused tests: pnpm test:e2e:session test/e2e/specs/smoke.spec.ts — 3/3 passing on macOS (arm64) against CEF Chromium 146.0.7680.165
Rust fmt/check (if changed): N/A — no Rust changes
Tauri fmt/check (if changed): N/A — no Tauri shell changes

Validation Blocked

command: N/A
error: N/A
impact: N/A

Behavior Changes

Intended behavior change: unify the E2E driver onto Appium Chromium + CEF CDP across all three platforms; one Appium session per run.
User-visible effect: none in shipped product. E2E test runs are faster and produce a consistent driver session across macOS / Linux / Windows.

Parity Contract

Legacy behavior preserved: yes — legacy per-spec scripts (pnpm test:e2e:cron-jobs, etc.) keep working via a shim that delegates to the new session runner.
Guard/fallback/dispatch parity checks: helpers/platform.ts shims isTauriDriver() / isMac2() / supportsExecuteScript() to their now-unified values so the 40+ specs that branched on these uniformly take the DOM path without per-spec edits.

Duplicate / Superseded PR Handling

Duplicate PR(s): N/A
Canonical PR: N/A
Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

Tests
- Unified E2E across Linux/macOS/Windows using a single Appium Chromium harness attached to the app’s debugging port.
- Added a unified session runner, new test scripts, automated chromedriver resolution/caching, improved readiness checks, and preserved retry-once behavior.
- Consolidated runner configuration and expanded artifact/log collection to include built app and Appium logs.
Chores
- Updated CI workflows and triggers to use the unified Appium Chromium path, standardized cache keys, and ignored E2E cache artifacts.

Collapses the per-platform driver split (Appium Mac2 on macOS, tauri-driver on Linux/Windows) onto a single backend: Appium Chromium driver attached to CEF's already-enabled remote-debugging port (:19222, configured in app/src-tauri/src/lib.rs:1556). One session per run, all specs sequential — eliminates the per-spec Appium + app restart cost. - wdio.conf.ts: single Chromium capability with debuggerAddress. - scripts/e2e-run-session.sh: new unified runner (launch CEF, wait for CDP, start Appium, run wdio). - scripts/e2e-run-spec.sh: thin shim delegating to the session runner. - helpers/platform.ts: shim — isTauriDriver/supportsExecuteScript now always true, isMac2 always false, so existing branched specs uniformly take the DOM path Chromium driver supports on every platform. - helpers/app-helpers.ts: waitForApp does a real readiness check. - package.json: add test:e2e:session{,:full} scripts.

…uri-cli Three follow-ups on top of the unified-driver foundation, validated by a green smoke run on macOS: - e2e-build.sh: call `pnpm tauri:ensure` and use `cargo tauri build` (not `pnpm exec tauri build`) so the .app bundle ships the CEF framework. Without this, the built binary panics at startup inside `cef::library_loader::LibraryLoader::new`. - e2e-run-session.sh: parse Chromium version from /json/version, download the matching chromedriver from Chrome for Testing into a workspace cache, and export E2E_CHROMEDRIVER_PATH. Without this, Appium's bundled chromedriver (currently v148) refuses to attach to CEF Chromium 146. - wdio.conf.ts: use the real host platformName (mac/linux/windows — the appium-chromium-driver supports those, not "chromium"), and pass the chromedriver path via `appium:executable` (the driver's actual cap name; `chromedriverExecutable` is silently dropped).

Three follow-on fixes after the unified driver landed. With these, the smoke spec passes in ~1s on macOS via the new pipeline (was ~21s before because chromedriver was attaching to the wrong CDP target). - e2e-run-session.sh: wait for the `tauri.localhost` page target to appear in CEF's /json/list, then close any `about:` page targets (the CEF prewarm child-webview slot) via /json/close/<id> so chromedriver's debuggerAddress attach binds to the real app. Without this, chromedriver attached to the prewarm `about:blank` and the session died as soon as CEF GC'd that target. - wdio.conf.ts: defensive `before:` hook switches to the main webview in case more than one page target survives the runner-side prune. - core-rpc.ts: drop the WebView path that did `await import('@tauri-apps/api/core')` inside `browser.execute`. The dynamic specifier only resolved under tauri-driver's renderer import map; under direct CDP it 404s with "Failed to resolve module specifier". The Node-side HTTP path (bearer token from tmpdir + port probe) is driver-agnostic and is now the only path. - .gitignore: ignore the per-version chromedriver cache the runner populates from Chrome for Testing.

coderabbitai · 2026-05-14T03:32:15Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR unifies E2E testing across Linux, macOS, and Windows using a single Appium Chromium driver attached to CEF's remote-debugging protocol (CDP). It adds a session orchestration script, updates build and WDIO config to attach to CDP, simplifies test helpers, and updates CI workflows and scripts to use the unified flow.

Changes

Appium Chromium E2E Unification

Layer / File(s)	Summary
App build infrastructure with CEF setup `app/scripts/e2e-build.sh`	E2E build script ensures vendored CEF-aware tauri-cli via `pnpm tauri:ensure`, sets `CEF_PATH` for runtime caching, and replaces `pnpm exec tauri build` with direct `cargo tauri build` invocations including `--bin OpenHuman` and OS-specific bundling options.
E2E session orchestration and lifecycle management `app/scripts/e2e-run-session.sh`	New comprehensive runner manages complete session: exports backend/driver ports, updates `config.toml` with mock backend URL (with backup/restore), validates frontend bundle contains mock server URL, resolves platform-specific CEF binary, launches app and polls CDP until `/json/version` responds, waits for `tauri.localhost` CDP target and closes `about:` prewarm targets, parses Chromium version, downloads matching `chromedriver`, installs Appium Chromium driver, starts Appium, and runs WDIO with EXIT trap cleanup.
WDIO configuration refactored for CDP attachment `app/test/wdio.conf.ts`	Refactored from per-OS tauri-driver/Appium launch logic to unified setup: derives `APPIUM_PORT`/`CEF_CDP_HOST`/`CEF_CDP_PORT` from environment, uses single Appium Chromium capability with `goog:chromeOptions.debuggerAddress` for CDP attachment, optionally overrides chromedriver via `E2E_CHROMEDRIVER_PATH`, adds `before` hook to select correct window target (prefers `tauri.localhost`, falls back to first non-`about:`), and removes platform-dependent capability/port logic.
Spec runner and supporting scripts `app/scripts/e2e-run-spec.sh`, `app/package.json`, `app/.gitignore`	Simplifies `e2e-run-spec.sh` to thin delegation calling `e2e-run-session.sh`; adds npm scripts `test:e2e:session` and `test:e2e:session:full` to `package.json`; adds `.gitignore` rule for `app/test/e2e/.cache/` to exclude cached artifacts.
Test helpers unified for Appium Chromium backend `app/test/e2e/helpers/platform.ts`, `core-rpc.ts`, `app-helpers.ts`	Updates platform detection to always return Tauri driver path (`isTauriDriver=true`, `isMac2=false`, `supportsExecuteScript=true`) reflecting unified Appium Chromium across all platforms; simplifies `core-rpc.ts` to unconditionally use Node-side HTTP JSON-RPC, removing WebView executeScript check; updates `app-helpers.ts` to prefer `waitForAppReady` readiness check over fixed pause and documents unified CDP-based harness.
CI workflows: agent-review and main E2E pipeline `.github/workflows/e2e-agent-review.yml`, `.github/workflows/e2e.yml`	Agent-review workflow updated to Appium Chromium setup (adds `unzip`, `curl`, `python3` dependencies; installs Appium v3 + chromium driver; runs `e2e-run-session.sh`; updates artifact logs). Main E2E workflow rewritten for unified approach: three platform jobs all install Appium + chromium driver, build via `e2e-build.sh`, run smoke/mega-flow via `e2e-run-session.sh` (Linux under `xvfb-run`), conditionally run full suite only on `workflow_dispatch` with `full=true`, standardized artifact upload on failure/cancel, unified Rust cache keys ending in `-unified`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

tinyhumansai/openhuman#283: Changes core-RPC dispatch logic between WebView and Node execution; this PR removes the WebView path in favor of unconditional Node-side JSON-RPC.
tinyhumansai/openhuman#886: Related E2E runner tooling changes touching e2e-build.sh and e2e-run-spec.sh used by the unified session runner.
tinyhumansai/openhuman#925: Related CEF prewarm/target handling that aligns with e2e-run-session.sh closing about: prewarm windows.

Suggested reviewers

graycyrus

Poem

🐰 I hopped to the CI, nose twitching with glee,
One driver now dances on Mac, Win, and nix,
Appium and Chromium tune CEF's bright key,
A session script launches, CDP finds the mix,
Tests run, logs gleam — carrots saved for me.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'test(e2e): unify driver onto Appium Chromium attached to CEF CDP' accurately and concisely describes the main change: consolidating platform-specific E2E drivers into a single Appium Chromium driver setup attached to the CEF remote-debugging port.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…unner Every E2E job now runs the same recipe: install Appium 3 + the chromium driver, build the CEF app, and invoke `app/scripts/e2e-run-session.sh`. The script handles CDP attach, prewarm-target pruning, chromedriver version pinning to CEF's Chromium, Appium startup, and WDIO invocation. - e2e.yml: Linux/macOS/Windows now share an identical recipe. Linux runs inside Xvfb. Linux + Windows are continue-on-error until we have a green CI run; macOS is the gating signal (smoke spec validated locally in ~1s on the unified driver). The old "build-only" workarounds for Linux and Windows are gone — they couldn't actually run specs because tauri-driver doesn't speak to CEF. Now they can. - e2e-agent-review.yml: same flip — drop tauri-driver, install Appium chromium driver, call e2e-run-session.sh. mega-flow runs but is non-blocking everywhere: phase 2 work (handling CEF window recreation that kills the chromedriver session) is needed before it gates merges.

senamakel · 2026-05-14T03:37:34Z

@coderabbitai review

coderabbitai · 2026-05-14T03:37:40Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

senamakel · 2026-05-14T04:57:10Z

@coderabbitai reviwe

senamakel · 2026-05-14T04:57:15Z

@coderabbitai review

coderabbitai · 2026-05-14T04:57:20Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-14T04:57:22Z

@senamakel Sure! Triggering a review of the PR now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

app/scripts/e2e-run-session.sh (2)

176-201: ⚡ Quick win

Pipeline failures indicate CEF app crashes before CDP is available.

The CI logs show "CEF app exited before CDP came up" on both Windows and Linux. The script's error handling is correct (lines 192-195 detect early exit), but the root cause appears to be an app/build issue rather than a script defect.

Consider adding more diagnostic output when the app exits early to help debug:

🔍 Add diagnostic output for early app exit

   if ! kill -0 "$APP_PID" 2>/dev/null; then
     echo "ERROR: CEF app exited before CDP came up. See $APP_LOG" >&2
+    echo "--- Last 50 lines of app log ---" >&2
+    tail -50 "$APP_LOG" 2>/dev/null >&2 || true
+    echo "--- End of app log ---" >&2
     exit 1
   fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 176 - 201, When the script
detects the app exited early (the kill -0 "$APP_PID" check in the CDP wait loop)
add diagnostic steps to capture why: immediately wait for the process to reap
and capture its exit code (use wait on APP_PID), print a brief tail of APP_LOG
(e.g., last ~200 lines), and print lightweight system state (e.g., ps/pgrep for
APP_BIN and free disk space) so CI logs include the app exit code and recent app
logs; reference APP_PID, APP_LOG, CEF_CDP_PORT and CDP_VERSION_JSON to locate
the changes in the CDP-wait loop.

128-138: 💤 Low value

Consider using find instead of ls for glob expansion.

Static analysis (SC2012) flags this: ls can misbehave with non-alphanumeric filenames. While unlikely in this controlled context, find is more robust.

♻️ Suggested refactor

-DIST_JS="$(ls dist/assets/index-*.js 2>/dev/null | head -1)"
+DIST_JS="$(find dist/assets -maxdepth 1 -name 'index-*.js' -print -quit 2>/dev/null)"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 128 - 138, The use of ls to
populate DIST_JS (ls dist/assets/index-*.js | head -1) is flagged by SC2012;
replace that pipeline with a robust find-based assignment that safely locates
the first matching file (pattern dist/assets/index-*.js) and assigns it to
DIST_JS, preserving the subsequent existence check and the grep check for
"127.0.0.1:${E2E_MOCK_PORT}" against DIST_JS; ensure the find invocation returns
a single path (or empty) so the rest of the script (the if [ -z "$DIST_JS" ] and
the grep check) continues to work as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/scripts/e2e-run-session.sh`:
- Around line 85-90: The script's OS case handling is missing Windows
(MINGW/MSYS/CYGWIN/Windows_NT) cleanup so OpenHuman processes aren't killed on
Windows; update the case "$OS" in the e2e-run-session.sh to add a Windows branch
(match MINGW*|MSYS*|CYGWIN*|Windows_NT) that invokes the Windows process-kill
command (e.g. taskkill /F /IM OpenHuman.exe 2>/dev/null || true) or a
tasklist/taskkill combo that targets processes whose image or command line
contains "OpenHuman", keeping the existing pkill usage for Darwin and Linux and
preserving the 2>/dev/null || true behavior.

In `@app/test/e2e/helpers/app-helpers.ts`:
- Around line 23-29: The current blanket catch around waitForAppReady(15_000)
masks all errors; change it to catch the error into a variable, inspect it, and
only swallow/fallback to await browser.pause(5_000) for genuine
readiness/timeouts (e.g., check for a specific TimeoutError type or
error.message that indicates not-ready), otherwise rethrow the error so
execution/session/DOM failures surface; update the catch block in the helper
where waitForAppReady is called to use catch (err) { if (isReadinessError(err))
await browser.pause(5_000); else throw err; } (or equivalent check) referencing
waitForAppReady and the catch block in app-helpers.ts.

---

Nitpick comments:
In `@app/scripts/e2e-run-session.sh`:
- Around line 176-201: When the script detects the app exited early (the kill -0
"$APP_PID" check in the CDP wait loop) add diagnostic steps to capture why:
immediately wait for the process to reap and capture its exit code (use wait on
APP_PID), print a brief tail of APP_LOG (e.g., last ~200 lines), and print
lightweight system state (e.g., ps/pgrep for APP_BIN and free disk space) so CI
logs include the app exit code and recent app logs; reference APP_PID, APP_LOG,
CEF_CDP_PORT and CDP_VERSION_JSON to locate the changes in the CDP-wait loop.
- Around line 128-138: The use of ls to populate DIST_JS (ls
dist/assets/index-*.js | head -1) is flagged by SC2012; replace that pipeline
with a robust find-based assignment that safely locates the first matching file
(pattern dist/assets/index-*.js) and assigns it to DIST_JS, preserving the
subsequent existence check and the grep check for "127.0.0.1:${E2E_MOCK_PORT}"
against DIST_JS; ensure the find invocation returns a single path (or empty) so
the rest of the script (the if [ -z "$DIST_JS" ] and the grep check) continues
to work as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: db08a16c-fbea-4249-b8f9-76a080f34d5d

📥 Commits

Reviewing files that changed from the base of the PR and between 5e6073b and 1812594.

📒 Files selected for processing (11)

.github/workflows/e2e-agent-review.yml
.github/workflows/e2e.yml
app/.gitignore
app/package.json
app/scripts/e2e-build.sh
app/scripts/e2e-run-session.sh
app/scripts/e2e-run-spec.sh
app/test/e2e/helpers/app-helpers.ts
app/test/e2e/helpers/core-rpc.ts
app/test/e2e/helpers/platform.ts
app/test/wdio.conf.ts

…t at runtime - macOS workflow: install x86_64-apple-darwin target. The vendored tauri-cli's build.rs (tauri-bundler/build.rs:134) compiles a CEF helper for that target to support universal binaries, even on arm64 hosts. macos-latest is arm64 and the target wasn't installed → "can't find crate for `core`". - e2e-run-session.sh: on Linux + Windows, find the CEF distribution that the vendored tauri-cli cached under $CEF_PATH/<version>/cef_<os>_<arch>/ and prepend it to LD_LIBRARY_PATH / PATH before launching the binary. The bare --no-bundle build has no co-located libcef.so/libcef.dll, so the OS loader couldn't find them and the app exited before CDP came up. - e2e-run-session.sh: log paths now use $RUNNER_TEMP / $TMPDIR rather than hard-coded /tmp. Windows runners don't have /tmp so the artifact-upload glob never matched anything. - e2e.yml: artifact globs use ${{ runner.temp }} to match the new log paths.

The upload-artifact path mismatch on docker container runners means the $RUNNER_TEMP-based artifact glob doesn't pick up logs created inside the container view (/__w/_temp). Print the log to stderr so the CI run page has the diagnosis without needing artifact archaeology.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

app/scripts/e2e-run-session.sh (1)

85-90: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the missing Windows process cleanup before probing CDP.

This case "$OS" still leaves OpenHuman.exe running on Windows, so the later http://127.0.0.1:${CEF_CDP_PORT}/json/version check can attach to a stale CEF instance instead of the one this script just launched.

🔧 Minimal fix

 echo "[runner] Killing any running OpenHuman instances..."
 case "$OS" in
   Darwin) pkill -f "OpenHuman" 2>/dev/null || true ;;
   Linux)  pkill -f "OpenHuman" 2>/dev/null || true ;;
+  MINGW*|MSYS*|CYGWIN*|Windows_NT)
+    taskkill.exe //F //IM "OpenHuman.exe" >/dev/null 2>&1 || true
+    ;;
 esac
 sleep 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 85 - 90, The script's OS cleanup
misses Windows so lingering OpenHuman.exe can block the CDP probe; update the
case handling for the variable OS in e2e-run-session.sh (the block that matches
Darwin/Linux) to also handle Windows (e.g., "Windows" or "MINGW"* as used by
your OS detection) and run a Windows-safe termination command such as taskkill
/IM "OpenHuman.exe" /F 2>/dev/null || true (or an equivalent Stop-Process
invocation for PowerShell environments), ensuring you still silence errors and
fall back to || true so the script continues.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/scripts/e2e-run-session.sh`:
- Around line 373-389: The readiness probe can succeed against a stale Appium if
the newly started process failed with EADDRINUSE; after starting Appium (the
block using APPIUM_BIN, APPIUM_PORT, APPIUM_LOG and APPIUM_PID) add a quick
verification step: wait a second, assert the background PID ($APPIUM_PID) is
alive (kill -0) and then verify that the listener on APPIUM_PORT is owned by
that PID (use lsof -i :$APPIUM_PORT -sTCP:LISTEN -n -P or ss/netstat fallback)
before accepting the readiness curl; if the PID is dead or the port is owned by
a different PID, print an error and exit non‑zero so we fail fast instead of
driving the wrong Appium instance.
- Around line 190-217: The script currently picks a CEF_DIST_DIR with head -1
which silently chooses an arbitrary cached distribution when multiple exist;
update the Linux and Windows branches (where CEF_DIST_DIR is computed and
exported to LD_LIBRARY_PATH/PATH, and where cygpath/WIN_CEF_DIST is used) to
detect multiple matching directories and either (a) select deterministically the
newest distribution (e.g., sort by modification time) or (b) fail fast with a
clear error/warning asking the runner to clear the cache; implement this by
replacing the plain head -1 selection logic with a deterministic ls -dt ... |
head -1 (or by checking if more than one match exists and exiting non-zero with
an explicit message) so the script no longer silently picks the wrong libcef.

---

Duplicate comments:
In `@app/scripts/e2e-run-session.sh`:
- Around line 85-90: The script's OS cleanup misses Windows so lingering
OpenHuman.exe can block the CDP probe; update the case handling for the variable
OS in e2e-run-session.sh (the block that matches Darwin/Linux) to also handle
Windows (e.g., "Windows" or "MINGW"* as used by your OS detection) and run a
Windows-safe termination command such as taskkill /IM "OpenHuman.exe" /F
2>/dev/null || true (or an equivalent Stop-Process invocation for PowerShell
environments), ensuring you still silence errors and fall back to || true so the
script continues.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 87764d31-200e-4e70-81db-e0fa962b6a41

📥 Commits

Reviewing files that changed from the base of the PR and between d3483b0 and 1597ca1.

📒 Files selected for processing (1)

app/scripts/e2e-run-session.sh

The openhuman_ci docker image runs as root, and Chromium refuses to start under uid 0 without --no-sandbox (crbug.com/638180). With the inline app log now in the CI run output, the failure surface was immediately visible: ERROR:zygote_host_impl_linux.cc:101] Running as root without --no-sandbox is not supported. Detect uid 0 on Linux and append --no-sandbox to the binary's argv. The flag is intentionally NOT applied on macOS/Windows where the runner is a regular user and the sandbox should stay enabled.

coderabbitai

♻️ Duplicate comments (3)

app/scripts/e2e-run-session.sh (3)

85-89: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the Windows cleanup branch here too.

This block only kills stale processes on Darwin/Linux, so reruns on Git Bash/MSYS can leave OpenHuman.exe alive and keep the CDP/app ports busy.

🔧 Minimal fix

 echo "[runner] Killing any running OpenHuman instances..."
 case "$OS" in
   Darwin) pkill -f "OpenHuman" 2>/dev/null || true ;;
   Linux)  pkill -f "OpenHuman" 2>/dev/null || true ;;
+  MINGW*|MSYS*|CYGWIN*|Windows_NT)
+    taskkill //F //IM "OpenHuman.exe" 2>/dev/null || true
+    ;;
 esac
 sleep 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 85 - 89, The cleanup only
handles Darwin/Linux so stale Windows Git Bash/MSYS processes like OpenHuman.exe
aren't killed; update the case on $OS to add a Windows branch (match
MINGW*|MSYS*|CYGWIN*|Windows) and invoke the Windows process killer (taskkill /F
/IM OpenHuman.exe or equivalent via cmd) redirecting errors to null and ensuring
the command returns success (|| true) just like the existing Darwin/Linux
branches; look for the pkill -f "OpenHuman" lines and the case "$OS" block to
insert the Windows branch.

190-217: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't silently pick the first cached CEF distribution.

head -1 makes warmed runners nondeterministic once multiple CEF versions exist under CEF_PATH, so Linux/Windows can load the wrong libcef and fail only intermittently.

🔧 Minimal fix

 case "$OS" in
   Linux)
-    CEF_DIST_DIR="$(ls -d "$CEF_PATH"/*/cef_linux_* 2>/dev/null | head -1)"
+    mapfile -t CEF_CANDIDATES < <(find "$CEF_PATH" -mindepth 2 -maxdepth 2 -type d -name 'cef_linux_*' | sort)
+    if [ "${`#CEF_CANDIDATES`[@]}" -gt 1 ]; then
+      echo "ERROR: multiple CEF Linux distributions found under $CEF_PATH; clear the cache or pin CEF_PATH." >&2
+      exit 1
+    fi
+    CEF_DIST_DIR="${CEF_CANDIDATES[0]:-}"
     if [ -n "$CEF_DIST_DIR" ] && [ -d "$CEF_DIST_DIR" ]; then
       export LD_LIBRARY_PATH="$CEF_DIST_DIR:${LD_LIBRARY_PATH:-}"
       echo "[runner] LD_LIBRARY_PATH includes: $CEF_DIST_DIR"
     else
       echo "[runner] Warning: no CEF Linux distribution found under $CEF_PATH — libcef.so may fail to load."
     fi
     ;;
   MINGW*|MSYS*|CYGWIN*|Windows_NT)
-    CEF_DIST_DIR="$(ls -d "$CEF_PATH"/*/cef_windows_* 2>/dev/null | head -1)"
+    mapfile -t CEF_CANDIDATES < <(find "$CEF_PATH" -mindepth 2 -maxdepth 2 -type d -name 'cef_windows_*' | sort)
+    if [ "${`#CEF_CANDIDATES`[@]}" -gt 1 ]; then
+      echo "ERROR: multiple CEF Windows distributions found under $CEF_PATH; clear the cache or pin CEF_PATH." >&2
+      exit 1
+    fi
+    CEF_DIST_DIR="${CEF_CANDIDATES[0]:-}"
     if [ -n "$CEF_DIST_DIR" ] && [ -d "$CEF_DIST_DIR" ]; then
       if command -v cygpath >/dev/null 2>&1; then
         WIN_CEF_DIST="$(cygpath -w "$CEF_DIST_DIR")"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 190 - 217, The script currently
uses head -1 to pick the first CEF distribution which is nondeterministic when
multiple versions exist; change the CEF_DIST_DIR resolution for both Linux and
Windows branches (the lines setting CEF_DIST_DIR via ls -d
"$CEF_PATH"/*/cef_linux_* and ls -d "$CEF_PATH"/*/cef_windows_*) to
deterministically pick the intended distribution (for example, pipe the ls
output to sort -V and use tail -n1 to select the highest/version-sorted entry)
so LD_LIBRARY_PATH (CEF_DIST_DIR) on Linux and PATH (CEF_DIST_DIR /
WIN_CEF_DIST) on Windows always point to a consistent, latest CEF directory
before the existing existence checks and exports.

386-402: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast if APPIUM_PORT already belongs to another server.

The readiness loop accepts any /status responder on that port. If the new Appium dies with EADDRINUSE, this can still pass against a stale instance and run the suite against the wrong server.

🔧 Minimal fix

 APPIUM_LOG="$LOG_DIR/appium-e2e-${LOG_SUFFIX}.log"
+if curl -sf "http://127.0.0.1:$APPIUM_PORT/status" >/dev/null 2>&1; then
+  echo "ERROR: Appium is already listening on port $APPIUM_PORT. Stop the stale server or set APPIUM_PORT." >&2
+  exit 1
+fi
+
 echo "[runner] Starting Appium on port $APPIUM_PORT"
 echo "[runner]   Appium logs: $APPIUM_LOG"
 "$APPIUM_BIN" --port "$APPIUM_PORT" --relaxed-security > "$APPIUM_LOG" 2>&1 &
 APPIUM_PID=$!
 
 for i in $(seq 1 30); do
+  if ! kill -0 "$APPIUM_PID" 2>/dev/null; then
+    echo "ERROR: Appium exited before becoming ready. Appium log follows:" >&2
+    cat "$APPIUM_LOG" >&2 || true
+    exit 1
+  fi
   if curl -sf "http://127.0.0.1:$APPIUM_PORT/status" >/dev/null 2>&1; then
     echo "[runner] Appium is ready."
     break
   fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 386 - 402, The readiness loop
can be fooled by a prior server on APPIUM_PORT; after launching Appium (using
APPIUM_BIN to start and setting APPIUM_PID and APPIUM_LOG), immediately check
whether the background process actually bound the port and/or died with
EADDRINUSE before accepting the /status response: capture the start exit state
by waiting a short moment (e.g. sleep 0.1) and if the process is no longer
running (check APPIUM_PID), or if APPIUM_LOG contains EADDRINUSE, fail fast;
alternatively verify the socket owner matches APPIUM_PID (use lsof/ss to confirm
the listening PID is APPIUM_PID for APPIUM_PORT) and if not, print an error and
exit 1 rather than proceeding to the /status loop.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@app/scripts/e2e-run-session.sh`:
- Around line 85-89: The cleanup only handles Darwin/Linux so stale Windows Git
Bash/MSYS processes like OpenHuman.exe aren't killed; update the case on $OS to
add a Windows branch (match MINGW*|MSYS*|CYGWIN*|Windows) and invoke the Windows
process killer (taskkill /F /IM OpenHuman.exe or equivalent via cmd) redirecting
errors to null and ensuring the command returns success (|| true) just like the
existing Darwin/Linux branches; look for the pkill -f "OpenHuman" lines and the
case "$OS" block to insert the Windows branch.
- Around line 190-217: The script currently uses head -1 to pick the first CEF
distribution which is nondeterministic when multiple versions exist; change the
CEF_DIST_DIR resolution for both Linux and Windows branches (the lines setting
CEF_DIST_DIR via ls -d "$CEF_PATH"/*/cef_linux_* and ls -d
"$CEF_PATH"/*/cef_windows_*) to deterministically pick the intended distribution
(for example, pipe the ls output to sort -V and use tail -n1 to select the
highest/version-sorted entry) so LD_LIBRARY_PATH (CEF_DIST_DIR) on Linux and
PATH (CEF_DIST_DIR / WIN_CEF_DIST) on Windows always point to a consistent,
latest CEF directory before the existing existence checks and exports.
- Around line 386-402: The readiness loop can be fooled by a prior server on
APPIUM_PORT; after launching Appium (using APPIUM_BIN to start and setting
APPIUM_PID and APPIUM_LOG), immediately check whether the background process
actually bound the port and/or died with EADDRINUSE before accepting the /status
response: capture the start exit state by waiting a short moment (e.g. sleep
0.1) and if the process is no longer running (check APPIUM_PID), or if
APPIUM_LOG contains EADDRINUSE, fail fast; alternatively verify the socket owner
matches APPIUM_PID (use lsof/ss to confirm the listening PID is APPIUM_PID for
APPIUM_PORT) and if not, print an error and exit 1 rather than proceeding to the
/status loop.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2effb35f-8c36-4118-a8ac-60d8a72c8df4

📥 Commits

Reviewing files that changed from the base of the PR and between 1597ca1 and 14d8cd1.

📒 Files selected for processing (1)

app/scripts/e2e-run-session.sh

…ndle Two new failures surfaced now that the previous blockers cleared. Inline log dump made both diagnoses cheap: - Linux: CEF zygote was crashing with "Failed global descriptor lookup: 7" and a flurry of dbus errors after --no-sandbox let it past the root check. Add --disable-dev-shm-usage (docker /dev/shm is too small), --disable-gpu, and --no-zygote so Chromium stops asking for a dbus session bus that isn't there. - macOS: the .app bundle launched and exited within ~2 seconds writing zero bytes to stdout/stderr. Classic "the OS killed the helper subprocess before it could log" symptom on unsigned bundles. Adhoc-sign with `codesign --force --deep --sign -` after build so the loader stops rejecting the CEF helpers.

With the previous fixes in place CDP actually came up on Linux — "DevTools listening on ws://127.0.0.1:19222/devtools/browser/..." — but the host process then panicked inside Tauri's single-instance plugin: thread 'main' panicked at plugins/single-instance/.../linux.rs:57: Result::unwrap() on Err(Address("unsupported transport 'disabled'")) The openhuman_ci container ships with DBUS_SESSION_BUS_ADDRESS=disabled:. zbus refuses that transport, so the plugin's setup crashes the host. Start a real session bus with `dbus-launch` and inherit its address for the rest of the runner. Skip on macOS/Windows where a session bus already exists.

macOS run on d0102d0 failed with "rustup-init[EXE]: unexpected argument 'install' found" from `cargo install --locked --path .../tauri-cli`. The Swatinem cache from a prior failed run had restored a poisoned ~/.cargo state where `cargo` resolved to rustup-init. Two fixes layered for resilience: - Bump the rust-cache key to e2e-macos-unified-v2 so the bad state isn't restored on this run. - Add `rustup default 1.93.0 && which cargo && cargo --version` step after toolchain install so a future cargo/rustup-init shadow is caught in CI with a clear error instead of silently breaking ensure-tauri-cli.

Massive progress on Linux with the dbus + CEF-args + lib-path fixes — the app launched, CDP came up, prewarm pruned, chromedriver downloaded. Only `unzip` was missing in the openhuman_ci docker image: app/scripts/e2e-run-session.sh: line 402: unzip: command not found Python 3 is already required by the runner (used to parse /json/list), so use stdlib zipfile when `unzip` isn't on PATH. Avoids an apt-get install for one binary.

Runner uses `set -euo pipefail`, so expanding "${APP_ARGS[@]}" when the array is empty (macOS/Windows path — no extra CEF flags needed) errors with "APP_ARGS[@]: unbound variable" and the app never launches. Use `${APP_ARGS[@]+"${APP_ARGS[@]}"}` to expand only when set, so macOS gets the bare binary launch it wants and Linux still gets --no-sandbox et al.

Four fixes flagged on PR tinyhumansai#1696's first CodeRabbit pass: - e2e-run-session.sh: add Windows process cleanup (`taskkill /F /IM OpenHuman.exe`). The Darwin/Linux pkill branches didn't have a Windows equivalent, so stale instances would survive across runs on Windows. - helpers/app-helpers.ts: stop swallowing non-timeout errors in `waitForApp()`. The previous blanket `catch` masked session/DOM failures behind a blind 5s pause. Re-throw anything that isn't the specific "waitForAppReady timed out" message so real failures surface. - e2e-run-session.sh: error out when CEF_PATH contains more than one cef_<os>_<arch> directory instead of silently picking the first via `ls | head -1`. After a CEF upgrade the wrong libcef would load and fail only on warmed runners. - e2e-run-session.sh: fail fast when APPIUM_PORT is already in use, and check that the Appium PID is still alive on each poll. Without this, a stale Appium passes the /status probe while our just-launched process dies with EADDRINUSE — and the tests drive the wrong instance.

senamakel added 3 commits May 13, 2026 19:50

senamakel requested a review from a team May 14, 2026 03:32

coderabbitai Bot requested changes May 14, 2026

View reviewed changes

Comment thread app/scripts/e2e-run-session.sh

Comment thread app/test/e2e/helpers/app-helpers.ts

senamakel added 2 commits May 13, 2026 22:47

coderabbitai Bot requested changes May 14, 2026

View reviewed changes

Comment thread app/scripts/e2e-run-session.sh

Comment thread app/scripts/e2e-run-session.sh

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

senamakel added 6 commits May 13, 2026 23:51

coderabbitai Bot approved these changes May 14, 2026

View reviewed changes

senamakel merged commit 6386854 into tinyhumansai:main May 14, 2026
20 checks passed

senamakel mentioned this pull request May 14, 2026

test: expand unit + e2e coverage from test-map across core domains #1724

Merged

12 tasks

This was referenced May 14, 2026

Add agent task orchestration #1768

Merged

fix(e2e/linux): silence dead-session noise + local docker harness refresh #1777

Merged

Conversation

senamakel commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

senamakel commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

senamakel commented May 14, 2026

Uh oh!

senamakel commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading