test(e2e): make the harness actually exercise the UI past login#1859
Conversation
…ec as real-UI reference
Establishes the new E2E pattern: one Appium session for the whole run,
each spec begins with `resetApp(<userId>)` which calls the in-place
`openhuman.test_reset` RPC, reloads the renderer, and walks the real
onboarding UI. Specs then drive the product through clicks, not direct
RPC calls.
Rust core
- src/openhuman/cron/store.rs: `clear_all_jobs(config) -> usize`
- src/openhuman/test_support/{mod,rpc,schemas}.rs: new domain exposing
`openhuman.test_reset` — wipes auth (active_user.toml + api_key),
clears chat_onboarding_completed, deletes all cron rows in-place.
- Wired into src/core/all.rs + src/openhuman/mod.rs.
E2E harness
- app/test/e2e/helpers/reset-app.ts: `resetApp(userId, { skipAuth? })`
— RPC wipe -> storage clear + reload -> deep-link auth + onboarding
walk.
Reference spec
- app/test/e2e/specs/cron-jobs-flow.spec.ts rewritten end-to-end:
Settings -> Cron Jobs panel via real navigation; Pause/Resume/Refresh/
Remove via button clicks; UI assertions only, with one trailing
cron_list oracle to confirm the sidecar agrees.
Follow-ups: extend test_reset coverage as other domains are converted
(memory, channels, skills, accounts, threads, notifications). Add
data-testid hooks before the sweep so selectors don't depend on
Tailwind class strings.
…re-mode modal - Race the openhuman.test_reset RPC against an 8s budget. On the very first spec of a run the user-scoped sidecar hasn't spawned yet, so the RPC is unreachable — that's the same baseline state the wipe would have produced, so treat it as success and skip the wipe. - Only reload the renderer when we actually wiped something, otherwise we'd throw away the in-app "Choose core mode" acceptance and wedge on the BootCheckGate modal. - Add dismissCoreModeModalIfVisible() that retries with a synthetic MouseEvent (more reliable than raw .click() against React's onClick) and is called both after the optional reload and after the deep-link auth in case BootCheckGate re-mounts.
…river session
The packaged E2E binary was running `app.restart()` on every login (the
identity-flip path in CoreStateProvider). That destroyed the CDP target
the chromium-driver session was attached to, and every subsequent
command failed with "session terminated" — silently masking ~all
post-login assertions in the existing specs.
`vite build` always produces PROD=true / DEV=false regardless of
--mode, so flipping the build mode alone doesn't change IS_DEV. Instead
gate `restartApp` on `import.meta.env.MODE === 'development'` too, and
have the E2E build script invoke `vite build --mode development`
up-front with a no-op `beforeBuildCommand` so Tauri doesn't overwrite
the dev-mode dist with a fresh prod build.
Also harden the onboarding walker: the shared `walkOnboarding` matches
a fixed list of step *titles* and silently skips any step it doesn't
recognize ("Connect your Gmail" etc.). Use the stable
`data-testid="onboarding-next-button"` selector instead and keep
advancing until the button unmounts.
cron-jobs-flow spec now passes 4/4 in 28s end-to-end.
walkOnboarding() was matching against a hardcoded list of step titles (Welcome, Install Skills, …) so any onboarding step that wasn't in ONBOARDING_OVERLAY_TEXTS (e.g. "Connect your Gmail") silently fell through. The walker reported success while the renderer stayed wedged behind onboarding, masking post-login breakage in ~every spec that uses completeOnboardingIfVisible. Replace the body with a testid-keyed loop: wait for data-testid="onboarding-next-button" to mount, click it via synthetic MouseEvent until it unmounts, bail if it stays disabled too long. resetApp now delegates to the shared walker (no more duplicated logic). This single helper change is the leverage point — every existing spec that already calls completeOnboardingIfVisible now actually advances past current and future onboarding steps.
…-auth reload Two changes that make existing specs (which don't use resetApp) actually work end-to-end, not just up to the login screen. 1. Pre-deep-link BootCheckGate dismissal in triggerAuthDeepLinkBypass. The handler calls `waitForAuthReadiness` which can't make progress while the "Choose core mode" modal is up — it eventually fails with "Sign-in failed. Please try again." and the spec is wedged on the login screen. Inlined here (rather than imported from shared-flows) to avoid a circular dependency. 2. walkOnboarding tolerates the post-auth reload. The deep-link auth triggers an identity flip → `restartApp` → reload, which re-mounts onboarding fresh for the new user namespace. Previously the walker exited the moment the next-button briefly unmounted between steps and the spec navigated mid-reload. Now it stays in the loop while the hash is still on `#/onboarding/*`. Also factor `dismissBootCheckGateIfVisible` into shared-flows as the canonical export and have resetApp + walkOnboarding delegate to it. Settings - Developer Options now 2/3 passing (was 0/3); cron-jobs-flow still 4/4. Remaining flake on the first test of a fresh launch is a warmup race, not a structural issue.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughThis PR adds E2E testing infrastructure to streamline test isolation and startup navigation. A new backend ChangesE2E Testing Infrastructure
Sequence DiagramsequenceDiagram
participant Spec as E2E Spec
participant Reset as resetApp()
participant RPC as openhuman.test_reset
participant Renderer as App Renderer
participant Gate as BootCheckGate Modal
participant DeepLink as Auth Deep-Link
participant Onboard as Onboarding Flow
Spec->>Reset: resetApp(userId)
Reset->>RPC: Call test_reset RPC with 8s timeout
RPC->>RPC: Clear cron jobs, wipe config state
Reset->>Renderer: Clear localStorage & sessionStorage
Reset->>Renderer: Reload webview
Reset->>Gate: dismissBootCheckGateIfVisible()
Reset->>DeepLink: triggerAuthDeepLinkBypass(userId)
DeepLink->>Gate: dismissBootCheckGateIfVisibleInline()
DeepLink->>Reset: Auth complete
Reset->>Onboard: walkOnboarding()
Onboard->>Gate: dismissBootCheckGateIfVisible()
Onboard->>Onboard: Poll onboarding next button, dispatch synthetic clicks
Onboard->>Reset: Onboarding complete
Reset->>Spec: Return userId
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
app/test/e2e/specs/cron-jobs-flow.spec.ts (2)
98-102: ⚡ Quick winPosition-based button selection is brittle.
Assuming
labels[0]is the toggle button relies on DOM order which can change if the row layout is reorganized. Consider either:
- Using
data-testid="cron-job-toggle-${name}"for explicit targeting- Or filtering buttons by a known set of toggle labels (
['Pause', 'Resume'])♻️ Suggested approach using label filtering
- // We care about the toggle button (first one in the row). - return labels[0] ?? null; + // Find the toggle button by its known labels. + const toggleLabels = ['Pause', 'Resume']; + return labels.find(l => toggleLabels.includes(l)) ?? null;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/test/e2e/specs/cron-jobs-flow.spec.ts` around lines 98 - 102, The current selector assumes the first button in the row is the toggle by returning labels[0], which is brittle; update the test in cron-jobs-flow.spec.ts to select the toggle explicitly either by adding and querying a data-testid attribute (e.g. querySelector(`[data-testid="cron-job-toggle-${name}"]`) using the cron job name) or by filtering the buttons array (from container.querySelectorAll('button') mapped to labels) for known toggle texts like 'Pause' or 'Resume' and returning that match instead of labels[0]; adjust the test to prefer the data-testid approach when available and fall back to label-filtering using the labels variable.
64-70: ⚡ Quick winFragile class-based selector pattern.
The regex
/text-sm font-semibold text-stone-900/matches Tailwind utility classes which are prone to change during styling updates. Similarly,.closest('div.p-4')is coupled to layout implementation.Consider adding
data-testidattributes to the CoreJobList component's job rows and action buttons, then query by[data-testid="cron-job-row-${name}"]instead. This aligns with the testid-keyed approach used elsewhere in this PR for onboarding.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/test/e2e/specs/cron-jobs-flow.spec.ts` around lines 64 - 70, The test uses fragile class-based selectors in the anonymous row-finder (the code building rows via /text-sm font-semibold text-stone-900/ and .closest('div.p-4')), so add stable data-testid attributes in the CoreJobList component for job rows and action buttons (e.g. data-testid="cron-job-row-<name>" and data-testid="cron-job-action-<name>") and update the test's row lookup (the rows variable / container usage) to query by document.querySelector(`[data-testid="cron-job-row-${name}"]`) and related data-testid selectors instead of the regex and .closest layout-based selector.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/utils/tauriCommands/core.ts`:
- Around line 82-91: The code in core.ts reads import.meta.env.MODE and
import.meta.env.DEV directly; replace those direct env reads with the
centralized exports from app/src/utils/config.ts by importing the exported
values (e.g., MODE and DEV or a boolean like isDevMode) and use them when
computing isDevLike and in the debug message; update the reference in the
isDevLike expression (currently using IS_DEV and import.meta.env.MODE) and the
console.debug string (currently referencing import.meta.env.MODE and
import.meta.env.DEV) to use the config exports instead so no direct
import.meta.env access remains in core.ts and the logic still uses IS_DEV plus
the re-exported mode flag.
In `@src/core/all.rs`:
- Around line 185-186: The code is registering the destructive E2E RPC
openhuman.test_reset via
controllers.extend(crate::openhuman::test_support::all_test_support_registered_controllers()),
which exposes a state-wiping endpoint in production; change this so test-support
controllers are only added under a hard gate (e.g., compile-time cfg or an
explicit runtime opt-in), by removing the unconditional call to
all_test_support_registered_controllers() and only invoking it when a secure
guard is present (for example cfg(test) / cfg(feature = "e2e_test_support") or a
checked env/flag like ENABLE_E2E_TEST_SUPPORT), ensuring
openhuman::test_support::all_test_support_registered_controllers and the exposed
openhuman.test_reset remain unavailable in normal production builds.
In `@src/openhuman/test_support/rpc.rs`:
- Around line 33-77: The reset() flow lacks debug/trace instrumentation; add
log::debug or tracing::debug/trace calls at function entry/exit and before/after
each external step (Config::load_or_init, cron::clear_all_jobs, Config::save,
default_root_openhuman_dir, clear_active_user) and on branch points (values of
onboarding_was_completed, api_key_was_set) and errors, using stable
grep-friendly prefixes like "[test_reset]" and include contextual fields (e.g.,
cron_jobs_removed, root path, summary) so callers can correlate; keep final info
log but emit earlier trace/debug lines around creating ResetSummary and before
returning RpcOutcome::new to record the outcome.
---
Nitpick comments:
In `@app/test/e2e/specs/cron-jobs-flow.spec.ts`:
- Around line 98-102: The current selector assumes the first button in the row
is the toggle by returning labels[0], which is brittle; update the test in
cron-jobs-flow.spec.ts to select the toggle explicitly either by adding and
querying a data-testid attribute (e.g.
querySelector(`[data-testid="cron-job-toggle-${name}"]`) using the cron job
name) or by filtering the buttons array (from
container.querySelectorAll('button') mapped to labels) for known toggle texts
like 'Pause' or 'Resume' and returning that match instead of labels[0]; adjust
the test to prefer the data-testid approach when available and fall back to
label-filtering using the labels variable.
- Around line 64-70: The test uses fragile class-based selectors in the
anonymous row-finder (the code building rows via /text-sm font-semibold
text-stone-900/ and .closest('div.p-4')), so add stable data-testid attributes
in the CoreJobList component for job rows and action buttons (e.g.
data-testid="cron-job-row-<name>" and data-testid="cron-job-action-<name>") and
update the test's row lookup (the rows variable / container usage) to query by
document.querySelector(`[data-testid="cron-job-row-${name}"]`) and related
data-testid selectors instead of the regex and .closest layout-based selector.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 09e6a036-8b12-49ce-bae1-f2aa6700a76f
📒 Files selected for processing (14)
app/package.jsonapp/scripts/e2e-build.shapp/src/utils/tauriCommands/core.tsapp/test/e2e/helpers/deep-link-helpers.tsapp/test/e2e/helpers/reset-app.tsapp/test/e2e/helpers/shared-flows.tsapp/test/e2e/specs/cron-jobs-flow.spec.tssrc/core/all.rssrc/openhuman/cron/mod.rssrc/openhuman/cron/store.rssrc/openhuman/mod.rssrc/openhuman/test_support/mod.rssrc/openhuman/test_support/rpc.rssrc/openhuman/test_support/schemas.rs
…s CodeRabbit feedback Three review fixes from PR tinyhumansai#1859: - **Critical**: gate `openhuman.test_reset` behind a new `e2e-test-support` cargo feature. Without the feature the controller is neither registered nor declared, so `try_invoke_registered_rpc` cannot route to it and `schema_for_rpc_method` returns None — the destructive wipe RPC simply does not exist in shipped binaries. The E2E build script and the Tauri shell crate both forward the feature on. - **Major**: route `import.meta.env.MODE` through a new `IS_DEV_LIKE` export in `app/src/utils/config.ts` so consumer code (`restartApp`) doesn't reach into `import.meta.env` directly. Keeps the canonical env-reading boundary intact per the codebase convention. - **Major**: add debug/trace instrumentation across every step of the reset lifecycle in `src/openhuman/test_support/rpc.rs`. Entry/exit, per-step start/ok, with the cron-row count and the resolved root dir surfaced so failures during the wipe are diagnosable from the log. Also fixes the failing Frontend Unit Tests check: the `restartApp` test that asserted on the literal debug-log message broke when the message changed; restore the original "dev mode → window.location.reload()" text now that `IS_DEV_LIKE` carries the meaning. Defer the testid nitpick to follow-up issue tinyhumansai#1861.
Summary
vite build(prod-modeIS_DEV=false) ran the realapp.restart()on every identity flip, killing the chromium-driver's CDP target. Specs reported "passing" because most assertions were RPC oracles that ran before the dead session blocked them.vite build --mode development+ aMODE === 'development'gate inrestartApp) so identity flip falls back towindow.location.reload(). The WebDriver session survives.walkOnboardingmatched a hardcoded list of step titles and silently skipped any step it didn't recognize ("Connect your Gmail"etc.). Replace with a testid-keyed loop (data-testid="onboarding-next-button") that advances until the button truly unmounts, including across the post-auth reload."Choose core mode"modal before triggering the auth deep-link — without this,waitForAuthReadinesscan't make progress and the spec wedges on the login screen with"Sign-in failed. Please try again.".openhuman.test_resetRust RPC +resetApp(userId)E2E helper that wipe sidecar state in-place between tests (auth + onboarding flag + cron jobs) so each spec starts from a fresh-install baseline without restarting the process.cron-jobs-flow.spec.tsfully rewritten to drive real UI clicks — Settings → Cron Jobs → Pause / Resume / Refresh / Remove — with one trailingcron_listRPC as oracle. Passes 4/4 in ~30s.Problem
The E2E harness wasn't catching post-login regressions. Specs hit
triggerAuthDeepLinkBypass→ app calledapp.restart()→ CDP target died → every subsequentbrowser.execute/$()call returned"session terminated"— including the assertions. Because the failures all looked the same and most assertions were RPC oracles already happy from before the restart, broken UI shipped silently.Verified by checking out the pre-existing
cron-jobs-flow.spec.ts(commit609b6b70) against the unmodified packaged build: same "session terminated" failure pattern at the same point. Not a regression from this branch — a latent harness flaw that this branch makes visible and fixes.Solution
Three layered fixes, in order of leverage:
Dev-mode E2E bundle (
app/scripts/e2e-build.sh+app/package.json+app/src/utils/tauriCommands/core.ts). The frontend now builds withvite build --mode development, the script overrides Tauri'sbeforeBuildCommandto a no-op so it doesn't clobber the dev dist, andrestartAppgates onimport.meta.env.MODE === 'development'(sincevite buildalways producesDEV=falseregardless of--mode). Result: identity flip becomeswindow.location.reload()and the WebDriver session survives.Testid-keyed onboarding walker (
app/test/e2e/helpers/shared-flows.ts). Replaces the title-list matcher with a loop that dispatches a syntheticMouseEventon[data-testid="onboarding-next-button"]until it unmounts. Also dismisses BootCheckGate first (lifted out into the exporteddismissBootCheckGateIfVisible). Survives the post-auth reload by treatingbutton-gone + hash-still-on-/onboardingas "wait, don't claim victory yet."Pre-deep-link BootCheckGate dismissal (
app/test/e2e/helpers/deep-link-helpers.ts). Inlined intotriggerAuthDeepLinkBypass(importing from shared-flows would be circular). Confirms bundled-core mode before the auth handler can race againstwaitForAuthReadiness.openhuman.test_resetRPC (src/openhuman/test_support/). Wipes auth (active_user.toml+api_key), clearschat_onboarding_completed, and deletes all cron rows in-place.resetApp()helper races the call against an 8s budget so the very first spec of a run (sidecar not yet spawned) treats unreachability as "already pristine."Submission Checklist
cron-jobs-flow.spec.tsis the reference rewrite; the helpers and the RPC are exercised by it end-to-end.N/A: changes are all in test infrastructure (E2E helpers, test_support domain) and a build script. The cron RPCclear_all_jobsis exercised end-to-end by the reference spec. Frontend logic inrestartAppadds one branch that is intentionally hit only in the E2E build.N/A: harness-only change, no product features added/removed.## RelatedN/A: shellrestartApponly changes behavior underMODE === 'development', which is bound to the E2E build flag and not present in release bundles.Closes #NNNin the## Relatedsection — none open; this is a discovery-driven fix.Impact
restartAppadds one extra check (import.meta.env.MODE === 'development') before invoking the Taurirestart_appcommand. Production bundles built withpnpm build(no--mode development) take the existing OS-restart path — verified by inspecting the dist output (grep "restartApp: invoking restart_app"succeeds in the prod-mode bundle;grep "restartApp: dev-like"only appears in the E2E bundle).openhuman.test_resetis a destructive RPC always reachable while the sidecar is up. Acceptable for now because the sidecar binds to127.0.0.1and requires the per-launch bearer token written to${tmpdir}/openhuman-e2e-rpc-token; release builds never write that token, so unauthenticated callers can't invoke it. A follow-up issue gates it behind an explicit env flag — see follow-up issues filed alongside this PR.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Summary by CodeRabbit