fix(e2e/linux): silence dead-session noise + local docker harness refresh#1777
Conversation
The Linux E2E job already runs in the `ghcr.io/tinyhumansai/openhuman_ci` container, but `e2e/docker-compose.yml` was stuck on the retired `yarn`/`tauri-driver` path so local repros didn't match CI. Update the compose stack so `./e2e/run-local.sh mega-flow` mirrors the `e2e-linux` job step-for-step (Appium chromium driver attached to CEF via `app/scripts/e2e-run-session.sh`). Foundation for chasing the Linux mega-flow session-death noise (WebDriverError cascade in run 25894480144) locally before pushing to CI.
…acts The Linux mega-flow run produces ~hundred lines of identical `WebDriverError: A session is either terminated or not started` plus a `Failed to trigger deep link: xdg-open …` for every sub-test that runs after CEF/Appium drops the connection. None of that noise is the actual problem — it's the deep-link helper retrying through macOS-only extension commands and then `xdg-open`, both of which always fail on the Linux CI container. Two changes: 1. `triggerDeepLink` short-circuits on a dead session. `isSessionDeadError` detects "session is either terminated or not started" / "invalid session id", and rethrows immediately with a clear "WebDriver session is dead" message. The macOS extension commands are now also gated on `process.platform === 'darwin'` so Linux never even attempts them. Linux loses the `xdg-open` fallback entirely: the CI container has no `.desktop` registration for `openhuman://`, so it has never actually worked there — it just produced noise. 2. `e2e.yml` Linux job uploads e2e artifacts on `always()` instead of `failure() || cancelled()`. Because mega-flow is `continue-on-error: true`, the job is reported success and the old condition skipped the artifact upload — leaving us with no app log / Appium log to diagnose the real session-death root cause. The next run will produce them.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR prioritizes WebView-based deep-link simulation, adds dead-session detection that fast-fails instead of retrying, and reworks triggerDeepLink fallbacks (macOS retries only; Linux no longer uses xdg-open). ChangesDeep-link helpers
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/test/e2e/helpers/deep-link-helpers.ts`:
- Around line 95-103: When detecting a dead session (isSessionDeadError(err)),
don't throw a plain wrapped Error that loses the dead-session signal; instead
create the new Error with the original error as its cause and preserve a
sentinel so downstream checks still recognize it (e.g. const e = new
Error(`WebDriver session is dead …`, { cause: err }); e.isSessionDead = true;
throw e). Ensure the same sentinel/cause-preservation approach is used in the
ready-check/ping path and the code that keys off the old regex so
isSessionDeadError(...) will return true for the wrapped error as well.
In `@e2e/run-local.sh`:
- Around line 46-61: The case/default branch currently calls ensure_built then
loops over "$@" which is empty when no arguments are given, so a bare
./e2e/run-local.sh is a no-op; update the logic so run_spec is invoked for the
default "smoke" spec when no args are passed (e.g., set a default args array or
replace for name in "$@" with for name in "${@:-smoke}") or explicitly check if
"$#" -eq 0 and populate a names list with "smoke" before the loop; keep the
ensure_built and run_spec("name") calls unchanged otherwise.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c4581298-a653-4fa2-9cb2-f9cab8924e28
📒 Files selected for processing (5)
.github/workflows/e2e.ymlapp/test/e2e/helpers/deep-link-helpers.tse2e/docker-compose.ymle2e/docker-local-bootstrap.she2e/run-local.sh
1. `trySimulateDeepLinkInWebView` ready-loop and inner `__simulateDeepLink`
call now check `isSessionDeadError` too. Previously a dead-session
thrown during polling/invocation returned `false` (or bubbled the raw
WebDriver error), which let `triggerDeepLink` fall through to the
macOS extension commands or the Linux "no .desktop registration"
error path — exactly the cascade we want to silence.
2. `isSessionDeadError` regex now also matches our own wrapped message
("WebDriver session is dead"). Without this, an outer `catch (err)`
in `triggerDeepLink` calling `isSessionDeadError(err)` would return
false after the inner helper had already replaced the WebDriver text,
defeating the rethrow.
3. `e2e/run-local.sh` no-arg invocation now actually runs the smoke spec.
The `case "${1:-smoke}"` defaulted *for matching* but the `*)` branch
iterated `"$@"` (empty), so a bare `./e2e/run-local.sh` was a no-op.
Reset positional params with `set -- smoke` before the loop.
Summary
The Linux mega-flow E2E job (e.g. run 25894480144) spams ~hundreds of lines of identical
WebDriverError: A session is either terminated or not startedplusFailed to trigger deep link: xdg-open …once CEF/Appium drops the connection mid-spec. The job is reported green only because mega-flow iscontinue-on-error: true. This PR doesn't fix the underlying session-death yet — it cleans up the noise so the real cause is diagnosable on the next run.app/test/e2e/helpers/deep-link-helpers.ts: detect "session terminated"/"invalid session id" once viaisSessionDeadError, rethrow a clear "WebDriver session is dead" message, gate the macOS-only extension commands onprocess.platform === 'darwin'so Linux never retries them, and drop the Linuxxdg-openfallback (the CI container has no.desktopregistration foropenhuman://, so it was just generating "Command failed" lines)..github/workflows/e2e.yml(Linux job): flip the artifact upload toif: always(). Because mega-flow iscontinue-on-error: truethe job is marked success, so the priorif: failure() || cancelled()upload condition skipped the artifacts — we currently have no app/Appium logs to see what crashed CEF.e2e/docker-compose.yml, newe2e/docker-local-bootstrap.shande2e/run-local.sh). The compose was stuck on the retiredyarn/tauri-driverpath; it now mirrors thee2e-linuxjob step-for-step (ghcr.io/tinyhumansai/openhuman_ci:latest+ Appium chromium driver attached to CEF on :19222 viaapp/scripts/e2e-run-session.sh). One-shot:./e2e/run-local.sh mega-flow.The real session-death root cause is still open — best guess from log reading is something in the 5th–6th
resetEverything(config_reset_local_data+writeMockConfig+ admin reset) leaves CEF or the Appium connection in a bad state. The follow-up commit will use the artifacts the next run now uploads.Test plan
pnpm typecheckclean on the changes.if: always()in.github/workflows/e2e.yml(CI step is namedUpload e2e artifacts; reviewer can verify by diff).xdg-openshell fallback removed indeep-link-helpers.ts; deep-link helper short-circuits on dead session viaisSessionDeadError.process.platform === 'darwin'); Windows path is unchanged (no shell fallback there to begin with)../e2e/run-local.sh build && ./e2e/run-local.sh mega-flownot run in this PR — that's a CEF build (~15 min) intended to be exercised by the follow-up that uses the artifacts this run now uploads to find the actual session-death root cause.Summary by CodeRabbit