Skip to content

test(e2e): unify driver onto Appium Chromium attached to CEF CDP#1696

Merged
senamakel merged 13 commits into
tinyhumansai:mainfrom
senamakel:e2e/unified-appium-chromium
May 14, 2026
Merged

test(e2e): unify driver onto Appium Chromium attached to CEF CDP#1696
senamakel merged 13 commits into
tinyhumansai:mainfrom
senamakel:e2e/unified-appium-chromium

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 14, 2026

Summary

  • Collapse the per-platform E2E driver split (Mac2 on macOS, tauri-driver on Linux/Windows) onto a single backend: Appium Chromium driver attached to CEF's already-enabled remote-debugging port (19222, configured in app/src-tauri/src/lib.rs:1556).
  • One Appium session per run, all specs sequential — eliminates the per-spec Appium + app restart cost.
  • New unified runner app/scripts/e2e-run-session.sh; legacy e2e-run-spec.sh now a thin shim so existing pnpm test:e2e:cron-jobs etc. keep working.
  • Auto-pins chromedriver to CEF's bundled Chromium version (currently 146.0.7680.165) via a Chrome for Testing download cached at app/test/e2e/.cache/chromedriver/<version>-<platform>/.
  • Validated on macOS: smoke spec (3/3 passing in ~1s).

Problem

The harness was split into two driver backends with two separate code paths:

  • Linux / Windows drove the WebView through tauri-driver (W3C WebDriver, full DOM access).
  • macOS drove the .app bundle through Appium Mac2 (XCUITest accessibility tree, no DOM, no executeScript).

Every spec branched on isTauriDriver() and isMac2(), and helpers like element-helpers.ts carried duplicate XPath-over-accessibility code for Mac2. mega-flow.spec.ts was written specifically against Mac2's limitations (deep links + mock log + RPC, no UI assertions) because Mac2 couldn't see the DOM at all. The asymmetry made macOS coverage shallow and any unified test ergonomics impossible.

Additionally, each spec invoked e2e-run-spec.sh, which started a fresh Appium / tauri-driver and a fresh OpenHuman.app per spec — wall-clock cost dominated by setup, not assertions.

Solution

CEF (the runtime under Tauri here) already exposes a Chrome DevTools Protocol port on 127.0.0.1:19222 on every platform (enabled in app/src-tauri/src/lib.rs:1556). That makes a unified driver possible:

  • wdio.conf.ts: single Chromium capability with goog:chromeOptions.debuggerAddress: 127.0.0.1:19222. Same caps on macOS / Linux / Windows. maxInstances: 1 keeps everything in one session. A before: hook switches to the main app webview after attach (defensive; the runner already prunes the prewarm target).
  • scripts/e2e-run-session.sh (new): launches the built CEF binary once, waits for CDP, prunes the prewarm about:blank target via /json/close/<id> so chromedriver doesn't attach to it (would otherwise die mid-run when CEF GC'd that target), parses Chromium version from /json/version and downloads a matching chromedriver from Chrome for Testing, then starts Appium with the chromium driver and runs WDIO.
  • scripts/e2e-build.sh: now invokes pnpm tauri:ensure and uses cargo tauri build (not pnpm exec tauri build) so the .app ships the CEF framework. Without this, the bundle panics at startup inside cef::library_loader::LibraryLoader::new (the failure mode CLAUDE.md warns about).
  • scripts/e2e-run-spec.sh: thin shim that execs e2e-run-session.sh, so existing per-spec scripts (test:e2e:cron-jobs, etc.) keep working.
  • helpers/platform.ts: collapsed — isTauriDriver() returns true, isMac2() returns false, supportsExecuteScript() returns true on every platform. The ~40 existing specs that branched on these uniformly take the DOM-capable code path, which Chromium driver supports everywhere.
  • helpers/core-rpc.ts: drops the WebView path that did await import('@tauri-apps/api/core') inside browser.execute. That dynamic specifier only resolved under tauri-driver's renderer import map; under direct CDP it 404s. The Node-side HTTP path (port probe + bearer token from ${tmpdir}/openhuman-e2e-rpc-token) is driver-agnostic and is now the only path.
  • helpers/app-helpers.ts: waitForApp does a real readiness check.

This is phase 1 of a multi-phase migration. The unified driver works end-to-end (smoke spec green on macOS). Phase 2 work that is not in this PR: the mega-flow and other deep-link-heavy specs hit a real CEF interaction — the vendored CEF runtime drops and recreates the main WebviewWindow on certain events (see app/src-tauri/src/lib.rs:907), which terminates the chromedriver session bound to the original target ID. That needs either session-reattach logic in the test helpers or a behavior change in the CEF window-recreation path; the right shape isn't obvious yet and warrants its own PR.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • N/A: this PR changes how E2E tests are driven, not new product code; coverage gate is irrelevant here. The unit + Rust coverage workflows still run against changed lines elsewhere as configured.
  • N/A: behaviour-only change to test harness; no feature rows added/removed.
  • N/A: harness change, no feature IDs.
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • N/A: no release-surface change.
  • N/A: no linked issue.

Impact

  • Desktop (macOS, Linux, Windows): single unified E2E driver path. macOS now has full DOM access (Mac2 didn't), so specs that branched on isTauriDriver() will execute their DOM assertions on macOS for the first time.
  • Performance: smoke spec went from ~21s to ~1s of test time (faster Appium attach + correct target on first try). For multi-spec runs, eliminating the per-spec Appium + CEF restart cost will save many minutes per CI run.
  • Migration: existing pnpm test:e2e:cron-jobs etc. continue to work (legacy runner is a shim). New entry point is pnpm test:e2e:session[:full].
  • CI: workflows that previously installed tauri-driver and ran e2e-run-spec.sh per spec can be simplified to a single session run; that simplification is deliberately out of scope here so the foundation can be validated first.

Related

  • Closes:
  • Follow-up PR(s)/TODOs:
    • Phase 2: handle CEF window recreation so mega-flow.spec.ts and other deep-link-heavy specs run green under the unified driver.
    • Dead-code sweep in helpers/element-helpers.ts (Mac2 XPath paths are now unreachable).
    • Move legacy spec files into specs/legacy/ once the mega-orchestrator covers their flows so the default test:e2e:session is fast.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: e2e/unified-appium-chromium
  • Commit SHA: c555df8

Validation Run

  • pnpm --filter openhuman-app format:check
  • pnpm typecheck
  • Focused tests: pnpm test:e2e:session test/e2e/specs/smoke.spec.ts — 3/3 passing on macOS (arm64) against CEF Chromium 146.0.7680.165
  • Rust fmt/check (if changed): N/A — no Rust changes
  • Tauri fmt/check (if changed): N/A — no Tauri shell changes

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: unify the E2E driver onto Appium Chromium + CEF CDP across all three platforms; one Appium session per run.
  • User-visible effect: none in shipped product. E2E test runs are faster and produce a consistent driver session across macOS / Linux / Windows.

Parity Contract

  • Legacy behavior preserved: yes — legacy per-spec scripts (pnpm test:e2e:cron-jobs, etc.) keep working via a shim that delegates to the new session runner.
  • Guard/fallback/dispatch parity checks: helpers/platform.ts shims isTauriDriver() / isMac2() / supportsExecuteScript() to their now-unified values so the 40+ specs that branched on these uniformly take the DOM path without per-spec edits.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: N/A
  • Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

  • Tests

    • Unified E2E across Linux/macOS/Windows using a single Appium Chromium harness attached to the app’s debugging port.
    • Added a unified session runner, new test scripts, automated chromedriver resolution/caching, improved readiness checks, and preserved retry-once behavior.
    • Consolidated runner configuration and expanded artifact/log collection to include built app and Appium logs.
  • Chores

    • Updated CI workflows and triggers to use the unified Appium Chromium path, standardized cache keys, and ignored E2E cache artifacts.

Review Change Stack

senamakel added 3 commits May 13, 2026 19:50
Collapses the per-platform driver split (Appium Mac2 on macOS,
tauri-driver on Linux/Windows) onto a single backend: Appium Chromium
driver attached to CEF's already-enabled remote-debugging port (:19222,
configured in app/src-tauri/src/lib.rs:1556). One session per run, all
specs sequential — eliminates the per-spec Appium + app restart cost.

- wdio.conf.ts: single Chromium capability with debuggerAddress.
- scripts/e2e-run-session.sh: new unified runner (launch CEF, wait for
  CDP, start Appium, run wdio).
- scripts/e2e-run-spec.sh: thin shim delegating to the session runner.
- helpers/platform.ts: shim — isTauriDriver/supportsExecuteScript now
  always true, isMac2 always false, so existing branched specs uniformly
  take the DOM path Chromium driver supports on every platform.
- helpers/app-helpers.ts: waitForApp does a real readiness check.
- package.json: add test:e2e:session{,:full} scripts.
…uri-cli

Three follow-ups on top of the unified-driver foundation, validated by a
green smoke run on macOS:

- e2e-build.sh: call `pnpm tauri:ensure` and use `cargo tauri build` (not
  `pnpm exec tauri build`) so the .app bundle ships the CEF framework.
  Without this, the built binary panics at startup inside
  `cef::library_loader::LibraryLoader::new`.

- e2e-run-session.sh: parse Chromium version from /json/version, download
  the matching chromedriver from Chrome for Testing into a workspace cache,
  and export E2E_CHROMEDRIVER_PATH. Without this, Appium's bundled
  chromedriver (currently v148) refuses to attach to CEF Chromium 146.

- wdio.conf.ts: use the real host platformName (mac/linux/windows — the
  appium-chromium-driver supports those, not "chromium"), and pass the
  chromedriver path via `appium:executable` (the driver's actual cap name;
  `chromedriverExecutable` is silently dropped).
Three follow-on fixes after the unified driver landed. With these, the
smoke spec passes in ~1s on macOS via the new pipeline (was ~21s before
because chromedriver was attaching to the wrong CDP target).

- e2e-run-session.sh: wait for the `tauri.localhost` page target to
  appear in CEF's /json/list, then close any `about:` page targets
  (the CEF prewarm child-webview slot) via /json/close/<id> so
  chromedriver's debuggerAddress attach binds to the real app. Without
  this, chromedriver attached to the prewarm `about:blank` and the
  session died as soon as CEF GC'd that target.

- wdio.conf.ts: defensive `before:` hook switches to the main webview
  in case more than one page target survives the runner-side prune.

- core-rpc.ts: drop the WebView path that did
  `await import('@tauri-apps/api/core')` inside `browser.execute`. The
  dynamic specifier only resolved under tauri-driver's renderer import
  map; under direct CDP it 404s with "Failed to resolve module
  specifier". The Node-side HTTP path (bearer token from tmpdir +
  port probe) is driver-agnostic and is now the only path.

- .gitignore: ignore the per-version chromedriver cache the runner
  populates from Chrome for Testing.
@senamakel senamakel requested a review from a team May 14, 2026 03:32
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR unifies E2E testing across Linux, macOS, and Windows using a single Appium Chromium driver attached to CEF's remote-debugging protocol (CDP). It adds a session orchestration script, updates build and WDIO config to attach to CDP, simplifies test helpers, and updates CI workflows and scripts to use the unified flow.

Changes

Appium Chromium E2E Unification

Layer / File(s) Summary
App build infrastructure with CEF setup
app/scripts/e2e-build.sh
E2E build script ensures vendored CEF-aware tauri-cli via pnpm tauri:ensure, sets CEF_PATH for runtime caching, and replaces pnpm exec tauri build with direct cargo tauri build invocations including --bin OpenHuman and OS-specific bundling options.
E2E session orchestration and lifecycle management
app/scripts/e2e-run-session.sh
New comprehensive runner manages complete session: exports backend/driver ports, updates config.toml with mock backend URL (with backup/restore), validates frontend bundle contains mock server URL, resolves platform-specific CEF binary, launches app and polls CDP until /json/version responds, waits for tauri.localhost CDP target and closes about: prewarm targets, parses Chromium version, downloads matching chromedriver, installs Appium Chromium driver, starts Appium, and runs WDIO with EXIT trap cleanup.
WDIO configuration refactored for CDP attachment
app/test/wdio.conf.ts
Refactored from per-OS tauri-driver/Appium launch logic to unified setup: derives APPIUM_PORT/CEF_CDP_HOST/CEF_CDP_PORT from environment, uses single Appium Chromium capability with goog:chromeOptions.debuggerAddress for CDP attachment, optionally overrides chromedriver via E2E_CHROMEDRIVER_PATH, adds before hook to select correct window target (prefers tauri.localhost, falls back to first non-about:), and removes platform-dependent capability/port logic.
Spec runner and supporting scripts
app/scripts/e2e-run-spec.sh, app/package.json, app/.gitignore
Simplifies e2e-run-spec.sh to thin delegation calling e2e-run-session.sh; adds npm scripts test:e2e:session and test:e2e:session:full to package.json; adds .gitignore rule for app/test/e2e/.cache/ to exclude cached artifacts.
Test helpers unified for Appium Chromium backend
app/test/e2e/helpers/platform.ts, core-rpc.ts, app-helpers.ts
Updates platform detection to always return Tauri driver path (isTauriDriver=true, isMac2=false, supportsExecuteScript=true) reflecting unified Appium Chromium across all platforms; simplifies core-rpc.ts to unconditionally use Node-side HTTP JSON-RPC, removing WebView executeScript check; updates app-helpers.ts to prefer waitForAppReady readiness check over fixed pause and documents unified CDP-based harness.
CI workflows: agent-review and main E2E pipeline
.github/workflows/e2e-agent-review.yml, .github/workflows/e2e.yml
Agent-review workflow updated to Appium Chromium setup (adds unzip, curl, python3 dependencies; installs Appium v3 + chromium driver; runs e2e-run-session.sh; updates artifact logs). Main E2E workflow rewritten for unified approach: three platform jobs all install Appium + chromium driver, build via e2e-build.sh, run smoke/mega-flow via e2e-run-session.sh (Linux under xvfb-run), conditionally run full suite only on workflow_dispatch with full=true, standardized artifact upload on failure/cancel, unified Rust cache keys ending in -unified.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • tinyhumansai/openhuman#283: Changes core-RPC dispatch logic between WebView and Node execution; this PR removes the WebView path in favor of unconditional Node-side JSON-RPC.
  • tinyhumansai/openhuman#886: Related E2E runner tooling changes touching e2e-build.sh and e2e-run-spec.sh used by the unified session runner.
  • tinyhumansai/openhuman#925: Related CEF prewarm/target handling that aligns with e2e-run-session.sh closing about: prewarm windows.

Suggested reviewers

  • graycyrus

Poem

🐰 I hopped to the CI, nose twitching with glee,
One driver now dances on Mac, Win, and nix,
Appium and Chromium tune CEF's bright key,
A session script launches, CDP finds the mix,
Tests run, logs gleam — carrots saved for me.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(e2e): unify driver onto Appium Chromium attached to CEF CDP' accurately and concisely describes the main change: consolidating platform-specific E2E drivers into a single Appium Chromium driver setup attached to the CEF remote-debugging port.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

…unner

Every E2E job now runs the same recipe: install Appium 3 + the chromium
driver, build the CEF app, and invoke `app/scripts/e2e-run-session.sh`.
The script handles CDP attach, prewarm-target pruning, chromedriver
version pinning to CEF's Chromium, Appium startup, and WDIO invocation.

- e2e.yml: Linux/macOS/Windows now share an identical recipe. Linux runs
  inside Xvfb. Linux + Windows are continue-on-error until we have a
  green CI run; macOS is the gating signal (smoke spec validated locally
  in ~1s on the unified driver). The old "build-only" workarounds for
  Linux and Windows are gone — they couldn't actually run specs because
  tauri-driver doesn't speak to CEF. Now they can.

- e2e-agent-review.yml: same flip — drop tauri-driver, install Appium
  chromium driver, call e2e-run-session.sh.

mega-flow runs but is non-blocking everywhere: phase 2 work (handling
CEF window recreation that kills the chromedriver session) is needed
before it gates merges.
@senamakel
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@senamakel
Copy link
Copy Markdown
Member Author

@coderabbitai reviwe

@senamakel
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

@senamakel Sure! Triggering a review of the PR now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
app/scripts/e2e-run-session.sh (2)

176-201: ⚡ Quick win

Pipeline failures indicate CEF app crashes before CDP is available.

The CI logs show "CEF app exited before CDP came up" on both Windows and Linux. The script's error handling is correct (lines 192-195 detect early exit), but the root cause appears to be an app/build issue rather than a script defect.

Consider adding more diagnostic output when the app exits early to help debug:

🔍 Add diagnostic output for early app exit
   if ! kill -0 "$APP_PID" 2>/dev/null; then
     echo "ERROR: CEF app exited before CDP came up. See $APP_LOG" >&2
+    echo "--- Last 50 lines of app log ---" >&2
+    tail -50 "$APP_LOG" 2>/dev/null >&2 || true
+    echo "--- End of app log ---" >&2
     exit 1
   fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 176 - 201, When the script
detects the app exited early (the kill -0 "$APP_PID" check in the CDP wait loop)
add diagnostic steps to capture why: immediately wait for the process to reap
and capture its exit code (use wait on APP_PID), print a brief tail of APP_LOG
(e.g., last ~200 lines), and print lightweight system state (e.g., ps/pgrep for
APP_BIN and free disk space) so CI logs include the app exit code and recent app
logs; reference APP_PID, APP_LOG, CEF_CDP_PORT and CDP_VERSION_JSON to locate
the changes in the CDP-wait loop.

128-138: 💤 Low value

Consider using find instead of ls for glob expansion.

Static analysis (SC2012) flags this: ls can misbehave with non-alphanumeric filenames. While unlikely in this controlled context, find is more robust.

♻️ Suggested refactor
-DIST_JS="$(ls dist/assets/index-*.js 2>/dev/null | head -1)"
+DIST_JS="$(find dist/assets -maxdepth 1 -name 'index-*.js' -print -quit 2>/dev/null)"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 128 - 138, The use of ls to
populate DIST_JS (ls dist/assets/index-*.js | head -1) is flagged by SC2012;
replace that pipeline with a robust find-based assignment that safely locates
the first matching file (pattern dist/assets/index-*.js) and assigns it to
DIST_JS, preserving the subsequent existence check and the grep check for
"127.0.0.1:${E2E_MOCK_PORT}" against DIST_JS; ensure the find invocation returns
a single path (or empty) so the rest of the script (the if [ -z "$DIST_JS" ] and
the grep check) continues to work as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/scripts/e2e-run-session.sh`:
- Around line 85-90: The script's OS case handling is missing Windows
(MINGW/MSYS/CYGWIN/Windows_NT) cleanup so OpenHuman processes aren't killed on
Windows; update the case "$OS" in the e2e-run-session.sh to add a Windows branch
(match MINGW*|MSYS*|CYGWIN*|Windows_NT) that invokes the Windows process-kill
command (e.g. taskkill /F /IM OpenHuman.exe 2>/dev/null || true) or a
tasklist/taskkill combo that targets processes whose image or command line
contains "OpenHuman", keeping the existing pkill usage for Darwin and Linux and
preserving the 2>/dev/null || true behavior.

In `@app/test/e2e/helpers/app-helpers.ts`:
- Around line 23-29: The current blanket catch around waitForAppReady(15_000)
masks all errors; change it to catch the error into a variable, inspect it, and
only swallow/fallback to await browser.pause(5_000) for genuine
readiness/timeouts (e.g., check for a specific TimeoutError type or
error.message that indicates not-ready), otherwise rethrow the error so
execution/session/DOM failures surface; update the catch block in the helper
where waitForAppReady is called to use catch (err) { if (isReadinessError(err))
await browser.pause(5_000); else throw err; } (or equivalent check) referencing
waitForAppReady and the catch block in app-helpers.ts.

---

Nitpick comments:
In `@app/scripts/e2e-run-session.sh`:
- Around line 176-201: When the script detects the app exited early (the kill -0
"$APP_PID" check in the CDP wait loop) add diagnostic steps to capture why:
immediately wait for the process to reap and capture its exit code (use wait on
APP_PID), print a brief tail of APP_LOG (e.g., last ~200 lines), and print
lightweight system state (e.g., ps/pgrep for APP_BIN and free disk space) so CI
logs include the app exit code and recent app logs; reference APP_PID, APP_LOG,
CEF_CDP_PORT and CDP_VERSION_JSON to locate the changes in the CDP-wait loop.
- Around line 128-138: The use of ls to populate DIST_JS (ls
dist/assets/index-*.js | head -1) is flagged by SC2012; replace that pipeline
with a robust find-based assignment that safely locates the first matching file
(pattern dist/assets/index-*.js) and assigns it to DIST_JS, preserving the
subsequent existence check and the grep check for "127.0.0.1:${E2E_MOCK_PORT}"
against DIST_JS; ensure the find invocation returns a single path (or empty) so
the rest of the script (the if [ -z "$DIST_JS" ] and the grep check) continues
to work as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: db08a16c-fbea-4249-b8f9-76a080f34d5d

📥 Commits

Reviewing files that changed from the base of the PR and between 5e6073b and 1812594.

📒 Files selected for processing (11)
  • .github/workflows/e2e-agent-review.yml
  • .github/workflows/e2e.yml
  • app/.gitignore
  • app/package.json
  • app/scripts/e2e-build.sh
  • app/scripts/e2e-run-session.sh
  • app/scripts/e2e-run-spec.sh
  • app/test/e2e/helpers/app-helpers.ts
  • app/test/e2e/helpers/core-rpc.ts
  • app/test/e2e/helpers/platform.ts
  • app/test/wdio.conf.ts

Comment thread app/scripts/e2e-run-session.sh
Comment thread app/test/e2e/helpers/app-helpers.ts
senamakel added 2 commits May 13, 2026 22:47
…t at runtime

- macOS workflow: install x86_64-apple-darwin target. The vendored tauri-cli's
  build.rs (tauri-bundler/build.rs:134) compiles a CEF helper for that target
  to support universal binaries, even on arm64 hosts. macos-latest is arm64
  and the target wasn't installed → "can't find crate for `core`".

- e2e-run-session.sh: on Linux + Windows, find the CEF distribution that the
  vendored tauri-cli cached under $CEF_PATH/<version>/cef_<os>_<arch>/ and
  prepend it to LD_LIBRARY_PATH / PATH before launching the binary. The bare
  --no-bundle build has no co-located libcef.so/libcef.dll, so the OS loader
  couldn't find them and the app exited before CDP came up.

- e2e-run-session.sh: log paths now use $RUNNER_TEMP / $TMPDIR rather than
  hard-coded /tmp. Windows runners don't have /tmp so the artifact-upload
  glob never matched anything.

- e2e.yml: artifact globs use ${{ runner.temp }} to match the new log paths.
The upload-artifact path mismatch on docker container runners means the
$RUNNER_TEMP-based artifact glob doesn't pick up logs created inside the
container view (/__w/_temp). Print the log to stderr so the CI run page
has the diagnosis without needing artifact archaeology.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
app/scripts/e2e-run-session.sh (1)

85-90: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the missing Windows process cleanup before probing CDP.

This case "$OS" still leaves OpenHuman.exe running on Windows, so the later http://127.0.0.1:${CEF_CDP_PORT}/json/version check can attach to a stale CEF instance instead of the one this script just launched.

🔧 Minimal fix
 echo "[runner] Killing any running OpenHuman instances..."
 case "$OS" in
   Darwin) pkill -f "OpenHuman" 2>/dev/null || true ;;
   Linux)  pkill -f "OpenHuman" 2>/dev/null || true ;;
+  MINGW*|MSYS*|CYGWIN*|Windows_NT)
+    taskkill.exe //F //IM "OpenHuman.exe" >/dev/null 2>&1 || true
+    ;;
 esac
 sleep 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 85 - 90, The script's OS cleanup
misses Windows so lingering OpenHuman.exe can block the CDP probe; update the
case handling for the variable OS in e2e-run-session.sh (the block that matches
Darwin/Linux) to also handle Windows (e.g., "Windows" or "MINGW"* as used by
your OS detection) and run a Windows-safe termination command such as taskkill
/IM "OpenHuman.exe" /F 2>/dev/null || true (or an equivalent Stop-Process
invocation for PowerShell environments), ensuring you still silence errors and
fall back to || true so the script continues.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/scripts/e2e-run-session.sh`:
- Around line 373-389: The readiness probe can succeed against a stale Appium if
the newly started process failed with EADDRINUSE; after starting Appium (the
block using APPIUM_BIN, APPIUM_PORT, APPIUM_LOG and APPIUM_PID) add a quick
verification step: wait a second, assert the background PID ($APPIUM_PID) is
alive (kill -0) and then verify that the listener on APPIUM_PORT is owned by
that PID (use lsof -i :$APPIUM_PORT -sTCP:LISTEN -n -P or ss/netstat fallback)
before accepting the readiness curl; if the PID is dead or the port is owned by
a different PID, print an error and exit non‑zero so we fail fast instead of
driving the wrong Appium instance.
- Around line 190-217: The script currently picks a CEF_DIST_DIR with head -1
which silently chooses an arbitrary cached distribution when multiple exist;
update the Linux and Windows branches (where CEF_DIST_DIR is computed and
exported to LD_LIBRARY_PATH/PATH, and where cygpath/WIN_CEF_DIST is used) to
detect multiple matching directories and either (a) select deterministically the
newest distribution (e.g., sort by modification time) or (b) fail fast with a
clear error/warning asking the runner to clear the cache; implement this by
replacing the plain head -1 selection logic with a deterministic ls -dt ... |
head -1 (or by checking if more than one match exists and exiting non-zero with
an explicit message) so the script no longer silently picks the wrong libcef.

---

Duplicate comments:
In `@app/scripts/e2e-run-session.sh`:
- Around line 85-90: The script's OS cleanup misses Windows so lingering
OpenHuman.exe can block the CDP probe; update the case handling for the variable
OS in e2e-run-session.sh (the block that matches Darwin/Linux) to also handle
Windows (e.g., "Windows" or "MINGW"* as used by your OS detection) and run a
Windows-safe termination command such as taskkill /IM "OpenHuman.exe" /F
2>/dev/null || true (or an equivalent Stop-Process invocation for PowerShell
environments), ensuring you still silence errors and fall back to || true so the
script continues.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 87764d31-200e-4e70-81db-e0fa962b6a41

📥 Commits

Reviewing files that changed from the base of the PR and between d3483b0 and 1597ca1.

📒 Files selected for processing (1)
  • app/scripts/e2e-run-session.sh

Comment thread app/scripts/e2e-run-session.sh
Comment thread app/scripts/e2e-run-session.sh
The openhuman_ci docker image runs as root, and Chromium refuses to start
under uid 0 without --no-sandbox (crbug.com/638180). With the inline app
log now in the CI run output, the failure surface was immediately visible:

  ERROR:zygote_host_impl_linux.cc:101] Running as root without
  --no-sandbox is not supported.

Detect uid 0 on Linux and append --no-sandbox to the binary's argv. The
flag is intentionally NOT applied on macOS/Windows where the runner is
a regular user and the sandbox should stay enabled.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
app/scripts/e2e-run-session.sh (3)

85-89: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the Windows cleanup branch here too.

This block only kills stale processes on Darwin/Linux, so reruns on Git Bash/MSYS can leave OpenHuman.exe alive and keep the CDP/app ports busy.

🔧 Minimal fix
 echo "[runner] Killing any running OpenHuman instances..."
 case "$OS" in
   Darwin) pkill -f "OpenHuman" 2>/dev/null || true ;;
   Linux)  pkill -f "OpenHuman" 2>/dev/null || true ;;
+  MINGW*|MSYS*|CYGWIN*|Windows_NT)
+    taskkill //F //IM "OpenHuman.exe" 2>/dev/null || true
+    ;;
 esac
 sleep 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 85 - 89, The cleanup only
handles Darwin/Linux so stale Windows Git Bash/MSYS processes like OpenHuman.exe
aren't killed; update the case on $OS to add a Windows branch (match
MINGW*|MSYS*|CYGWIN*|Windows) and invoke the Windows process killer (taskkill /F
/IM OpenHuman.exe or equivalent via cmd) redirecting errors to null and ensuring
the command returns success (|| true) just like the existing Darwin/Linux
branches; look for the pkill -f "OpenHuman" lines and the case "$OS" block to
insert the Windows branch.

190-217: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't silently pick the first cached CEF distribution.

head -1 makes warmed runners nondeterministic once multiple CEF versions exist under CEF_PATH, so Linux/Windows can load the wrong libcef and fail only intermittently.

🔧 Minimal fix
 case "$OS" in
   Linux)
-    CEF_DIST_DIR="$(ls -d "$CEF_PATH"/*/cef_linux_* 2>/dev/null | head -1)"
+    mapfile -t CEF_CANDIDATES < <(find "$CEF_PATH" -mindepth 2 -maxdepth 2 -type d -name 'cef_linux_*' | sort)
+    if [ "${`#CEF_CANDIDATES`[@]}" -gt 1 ]; then
+      echo "ERROR: multiple CEF Linux distributions found under $CEF_PATH; clear the cache or pin CEF_PATH." >&2
+      exit 1
+    fi
+    CEF_DIST_DIR="${CEF_CANDIDATES[0]:-}"
     if [ -n "$CEF_DIST_DIR" ] && [ -d "$CEF_DIST_DIR" ]; then
       export LD_LIBRARY_PATH="$CEF_DIST_DIR:${LD_LIBRARY_PATH:-}"
       echo "[runner] LD_LIBRARY_PATH includes: $CEF_DIST_DIR"
     else
       echo "[runner] Warning: no CEF Linux distribution found under $CEF_PATH — libcef.so may fail to load."
     fi
     ;;
   MINGW*|MSYS*|CYGWIN*|Windows_NT)
-    CEF_DIST_DIR="$(ls -d "$CEF_PATH"/*/cef_windows_* 2>/dev/null | head -1)"
+    mapfile -t CEF_CANDIDATES < <(find "$CEF_PATH" -mindepth 2 -maxdepth 2 -type d -name 'cef_windows_*' | sort)
+    if [ "${`#CEF_CANDIDATES`[@]}" -gt 1 ]; then
+      echo "ERROR: multiple CEF Windows distributions found under $CEF_PATH; clear the cache or pin CEF_PATH." >&2
+      exit 1
+    fi
+    CEF_DIST_DIR="${CEF_CANDIDATES[0]:-}"
     if [ -n "$CEF_DIST_DIR" ] && [ -d "$CEF_DIST_DIR" ]; then
       if command -v cygpath >/dev/null 2>&1; then
         WIN_CEF_DIST="$(cygpath -w "$CEF_DIST_DIR")"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 190 - 217, The script currently
uses head -1 to pick the first CEF distribution which is nondeterministic when
multiple versions exist; change the CEF_DIST_DIR resolution for both Linux and
Windows branches (the lines setting CEF_DIST_DIR via ls -d
"$CEF_PATH"/*/cef_linux_* and ls -d "$CEF_PATH"/*/cef_windows_*) to
deterministically pick the intended distribution (for example, pipe the ls
output to sort -V and use tail -n1 to select the highest/version-sorted entry)
so LD_LIBRARY_PATH (CEF_DIST_DIR) on Linux and PATH (CEF_DIST_DIR /
WIN_CEF_DIST) on Windows always point to a consistent, latest CEF directory
before the existing existence checks and exports.

386-402: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast if APPIUM_PORT already belongs to another server.

The readiness loop accepts any /status responder on that port. If the new Appium dies with EADDRINUSE, this can still pass against a stale instance and run the suite against the wrong server.

🔧 Minimal fix
 APPIUM_LOG="$LOG_DIR/appium-e2e-${LOG_SUFFIX}.log"
+if curl -sf "http://127.0.0.1:$APPIUM_PORT/status" >/dev/null 2>&1; then
+  echo "ERROR: Appium is already listening on port $APPIUM_PORT. Stop the stale server or set APPIUM_PORT." >&2
+  exit 1
+fi
+
 echo "[runner] Starting Appium on port $APPIUM_PORT"
 echo "[runner]   Appium logs: $APPIUM_LOG"
 "$APPIUM_BIN" --port "$APPIUM_PORT" --relaxed-security > "$APPIUM_LOG" 2>&1 &
 APPIUM_PID=$!
 
 for i in $(seq 1 30); do
+  if ! kill -0 "$APPIUM_PID" 2>/dev/null; then
+    echo "ERROR: Appium exited before becoming ready. Appium log follows:" >&2
+    cat "$APPIUM_LOG" >&2 || true
+    exit 1
+  fi
   if curl -sf "http://127.0.0.1:$APPIUM_PORT/status" >/dev/null 2>&1; then
     echo "[runner] Appium is ready."
     break
   fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/scripts/e2e-run-session.sh` around lines 386 - 402, The readiness loop
can be fooled by a prior server on APPIUM_PORT; after launching Appium (using
APPIUM_BIN to start and setting APPIUM_PID and APPIUM_LOG), immediately check
whether the background process actually bound the port and/or died with
EADDRINUSE before accepting the /status response: capture the start exit state
by waiting a short moment (e.g. sleep 0.1) and if the process is no longer
running (check APPIUM_PID), or if APPIUM_LOG contains EADDRINUSE, fail fast;
alternatively verify the socket owner matches APPIUM_PID (use lsof/ss to confirm
the listening PID is APPIUM_PID for APPIUM_PORT) and if not, print an error and
exit 1 rather than proceeding to the /status loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@app/scripts/e2e-run-session.sh`:
- Around line 85-89: The cleanup only handles Darwin/Linux so stale Windows Git
Bash/MSYS processes like OpenHuman.exe aren't killed; update the case on $OS to
add a Windows branch (match MINGW*|MSYS*|CYGWIN*|Windows) and invoke the Windows
process killer (taskkill /F /IM OpenHuman.exe or equivalent via cmd) redirecting
errors to null and ensuring the command returns success (|| true) just like the
existing Darwin/Linux branches; look for the pkill -f "OpenHuman" lines and the
case "$OS" block to insert the Windows branch.
- Around line 190-217: The script currently uses head -1 to pick the first CEF
distribution which is nondeterministic when multiple versions exist; change the
CEF_DIST_DIR resolution for both Linux and Windows branches (the lines setting
CEF_DIST_DIR via ls -d "$CEF_PATH"/*/cef_linux_* and ls -d
"$CEF_PATH"/*/cef_windows_*) to deterministically pick the intended distribution
(for example, pipe the ls output to sort -V and use tail -n1 to select the
highest/version-sorted entry) so LD_LIBRARY_PATH (CEF_DIST_DIR) on Linux and
PATH (CEF_DIST_DIR / WIN_CEF_DIST) on Windows always point to a consistent,
latest CEF directory before the existing existence checks and exports.
- Around line 386-402: The readiness loop can be fooled by a prior server on
APPIUM_PORT; after launching Appium (using APPIUM_BIN to start and setting
APPIUM_PID and APPIUM_LOG), immediately check whether the background process
actually bound the port and/or died with EADDRINUSE before accepting the /status
response: capture the start exit state by waiting a short moment (e.g. sleep
0.1) and if the process is no longer running (check APPIUM_PID), or if
APPIUM_LOG contains EADDRINUSE, fail fast; alternatively verify the socket owner
matches APPIUM_PID (use lsof/ss to confirm the listening PID is APPIUM_PID for
APPIUM_PORT) and if not, print an error and exit 1 rather than proceeding to the
/status loop.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2effb35f-8c36-4118-a8ac-60d8a72c8df4

📥 Commits

Reviewing files that changed from the base of the PR and between 1597ca1 and 14d8cd1.

📒 Files selected for processing (1)
  • app/scripts/e2e-run-session.sh

senamakel added 6 commits May 13, 2026 23:51
…ndle

Two new failures surfaced now that the previous blockers cleared. Inline
log dump made both diagnoses cheap:

- Linux: CEF zygote was crashing with "Failed global descriptor lookup: 7"
  and a flurry of dbus errors after --no-sandbox let it past the root
  check. Add --disable-dev-shm-usage (docker /dev/shm is too small),
  --disable-gpu, and --no-zygote so Chromium stops asking for a dbus
  session bus that isn't there.

- macOS: the .app bundle launched and exited within ~2 seconds writing
  zero bytes to stdout/stderr. Classic "the OS killed the helper subprocess
  before it could log" symptom on unsigned bundles. Adhoc-sign with
  `codesign --force --deep --sign -` after build so the loader stops
  rejecting the CEF helpers.
With the previous fixes in place CDP actually came up on Linux —
"DevTools listening on ws://127.0.0.1:19222/devtools/browser/..." —
but the host process then panicked inside Tauri's single-instance plugin:

  thread 'main' panicked at plugins/single-instance/.../linux.rs:57:
  Result::unwrap() on Err(Address("unsupported transport 'disabled'"))

The openhuman_ci container ships with DBUS_SESSION_BUS_ADDRESS=disabled:.
zbus refuses that transport, so the plugin's setup crashes the host.
Start a real session bus with `dbus-launch` and inherit its address for
the rest of the runner. Skip on macOS/Windows where a session bus
already exists.
macOS run on d0102d0 failed with "rustup-init[EXE]: unexpected argument
'install' found" from `cargo install --locked --path .../tauri-cli`. The
Swatinem cache from a prior failed run had restored a poisoned ~/.cargo
state where `cargo` resolved to rustup-init.

Two fixes layered for resilience:
- Bump the rust-cache key to e2e-macos-unified-v2 so the bad state isn't
  restored on this run.
- Add `rustup default 1.93.0 && which cargo && cargo --version` step
  after toolchain install so a future cargo/rustup-init shadow is caught
  in CI with a clear error instead of silently breaking ensure-tauri-cli.
Massive progress on Linux with the dbus + CEF-args + lib-path fixes —
the app launched, CDP came up, prewarm pruned, chromedriver downloaded.
Only `unzip` was missing in the openhuman_ci docker image:

  app/scripts/e2e-run-session.sh: line 402: unzip: command not found

Python 3 is already required by the runner (used to parse /json/list),
so use stdlib zipfile when `unzip` isn't on PATH. Avoids an apt-get
install for one binary.
Runner uses `set -euo pipefail`, so expanding "${APP_ARGS[@]}" when the
array is empty (macOS/Windows path — no extra CEF flags needed) errors
with "APP_ARGS[@]: unbound variable" and the app never launches.

Use `${APP_ARGS[@]+"${APP_ARGS[@]}"}` to expand only when set, so macOS
gets the bare binary launch it wants and Linux still gets --no-sandbox
et al.
Four fixes flagged on PR tinyhumansai#1696's first CodeRabbit pass:

- e2e-run-session.sh: add Windows process cleanup (`taskkill /F /IM
  OpenHuman.exe`). The Darwin/Linux pkill branches didn't have a Windows
  equivalent, so stale instances would survive across runs on Windows.

- helpers/app-helpers.ts: stop swallowing non-timeout errors in
  `waitForApp()`. The previous blanket `catch` masked session/DOM
  failures behind a blind 5s pause. Re-throw anything that isn't the
  specific "waitForAppReady timed out" message so real failures surface.

- e2e-run-session.sh: error out when CEF_PATH contains more than one
  cef_<os>_<arch> directory instead of silently picking the first via
  `ls | head -1`. After a CEF upgrade the wrong libcef would load and
  fail only on warmed runners.

- e2e-run-session.sh: fail fast when APPIUM_PORT is already in use, and
  check that the Appium PID is still alive on each poll. Without this,
  a stale Appium passes the /status probe while our just-launched
  process dies with EADDRINUSE — and the tests drive the wrong instance.
@senamakel senamakel merged commit 6386854 into tinyhumansai:main May 14, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant