test(e2e): deep chat-harness coverage + streaming mock LLM + rust-e2e Linux lane by senamakel · Pull Request #1892 · tinyhumansai/openhuman

senamakel · 2026-05-16T04:48:12Z

Summary

Two-part follow-up to #1889. Both parts share the same theme: actually exercise the full chat pipeline (UI → core → workspace → disk) end-to-end and validate it in CI.

1. Streaming mock LLM + chat-harness specs

The mock backend at scripts/mock-api/routes/llm.mjs previously only served non-streaming /openai/v1/chat/completions. The agent's production path requests stream: true and parses SSE chunks — a code path with zero E2E coverage. This PR extends the mock to emit OpenAI-shaped SSE chunks driven by a scriptable behavior knob:

llmStreamScript — explicit array of { text | thinking | toolCall | finish | usage | error } entries
llmStreamChunkDelayMs — default per-chunk delay (ms)
llmForcedResponses / llmKeywordRules — auto-convert to streaming scripts when stream: true
Tool-call streaming splits one toolCall entry into an opening chunk (id + name + first 8 chars of args) plus incremental arg fragments, matching how real providers stream

New read-only introspection RPCs in src/openhuman/test_support/introspect.rs, gated by the same per-launch bearer as test_reset:

openhuman.test_support_workspace_root
openhuman.test_support_list_workspace_files (depth-capped at 6, entry-capped at 2000)
openhuman.test_support_read_workspace_file (1 MiB lossy-UTF-8 cap, rejects .. escape)
openhuman.test_support_in_flight_chats — snapshot of the live IN_FLIGHT map in channels::providers::web

Four new specs, each going from real UI clicks to disk + Rust state:

Spec	What it locks down
`chat-harness-send-stream.spec.ts`	Send a prompt → 4-chunk SSE reply assembles → IN_FLIGHT populates then drains → `memory/conversations/threads/<hex(thread)>.jsonl` contains user prompt + canary → `list_workspace_files` returns non-empty thread file
`chat-harness-cancel.spec.ts`	Slow 6-chunk stream (500 ms/chunk) → click Cancel after chunk 1 → IN_FLIGHT clears within 8 s → late chunks NEVER reach DOM → composer interactive again → persisted file does NOT contain late chunks
`chat-harness-subagent.spec.ts`	Three forced responses: orchestrator emits `delegate_researcher` tool_call → researcher sub-agent text → orchestrator final canary. Asserts redux `chatRuntime.inferenceStatusByThread[].phase === 'subagent'` OR a `subagent:researcher` timeline entry, ≥2 mock-LLM hits, IN_FLIGHT drains, final canary persists
`chat-harness-scroll-render.spec.ts`	Long stream with bold + fenced code + link → `<strong>`, `<pre><code>`, `<a[href]>` all render → message column auto-scrolls within 40 px of bottom → manual scroll-up releases the auto-stick

Together these exercise composer wiring, SSE transport, agent loop, sub-agent dispatch, native tool-call parsing, cancel semantics, scroll-anchor behavior, markdown rendering, redux chatRuntime mirror, conversations JSONL store, and the IN_FLIGHT registry.

2. Rust-E2E Linux lane

The 10 tests/*_e2e.rs integration suites were running in the unit-test workflow with no mock backend wired up — any suite that wants BACKEND_URL had to skip itself. This adds a dedicated rust-e2e-linux job in e2e-reusable.yml (alongside the Tauri e2e-linux job, on the same Linux track):

Runner: scripts/test-rust-e2e.sh (also pnpm test:rust:e2e)
Boots scripts/mock-api-server.mjs on port 18505 → exports BACKEND_URL / VITE_BACKEND_URL → cargo test --test <all 10 suites> in one invocation
#[ignore] guards on live_routing_e2e, subconscious_e2e, memory_graph_sync_e2e, etc. are respected (they need real backends / Ollama)
Smoke-validated locally: 38 passing in json_rpc_e2e, 11 in linux_cef_deb_runtime_e2e, 11 in screen_intelligence_vision_e2e, 2 in memory_roundtrip_e2e, etc.

End state: 3 Tauri E2E lanes (smoke / mega-flow / full) + 1 Rust E2E lane, all Linux, all gated by inputs.run_linux in the reusable workflow.

3. Release-leak hygiene

e2e-reusable.yml is invoked from release-staging.yml and release-production.yml pretest gates, so its uploaded artifacts surface on release job pages and persist for 7 days. Mock-backend payload logs, Appium driver transcripts, and CDP event dumps can echo env vars, auth tokens, and provider URLs — none of that belongs pinned to a release artifact.

Dropped all four Upload e2e artifacts steps (one per OS) and the rust-e2e mock-log upload. Local Docker repro is unaffected — the same files still land in /tmp and app/test/e2e/artifacts/ inside the container.

Files

Mock LLM streaming: scripts/mock-api/routes/llm.mjs (+~180 lines, SSE branch + scripting helpers)
Introspection RPCs: src/openhuman/test_support/introspect.rs (new), src/openhuman/test_support/schemas.rs (registered), src/openhuman/channels/providers/web.rs (in_flight_entries_for_test helper)
Specs: 4 new files under app/test/e2e/specs/chat-harness-*.spec.ts
Rust-E2E: scripts/test-rust-e2e.sh (new), package.json (test:rust:e2e script), .github/workflows/e2e-reusable.yml (new rust-e2e-linux job + dropped uploads)

Verification

pnpm typecheck green
cargo check --bin openhuman-core green
cargo test --no-run --test <all 10 e2e suites> compiles
bash scripts/test-rust-e2e.sh end-to-end on local Linux container: all suites pass with #[ignore] correctly respected

AI Authored PR Metadata

Authored by: Claude (Opus 4.7) in a Claude Code session
Human reviewer: @senamakel

Summary by CodeRabbit

Tests
- Added comprehensive E2E specs covering streaming send, mid-stream cancel, scroll/rendering, subagent delegation, and smoke tweaks; shared test helpers and workspace persistence assertions included
- New Rust integration E2E suites and a runner script
New Features
- Mock backend now supports OpenAI-compatible SSE streaming for chat completions
- Read-only test-support introspection APIs to list/read workspace files and view in-flight chats
Chores
- CI E2E workflow updated to add Rust job and change artifact upload behavior

…ion RPCs Comprehensive end-to-end coverage of the chat pipeline driven through the real UI, with assertions reaching all the way down to the workspace files and the live Rust IN_FLIGHT state. None of this existed before — the chat-flow.spec.ts from PR tinyhumansai#1889 only covered "user message bubble renders," which catches roughly 5% of harness regressions. ## Infrastructure **Streaming mock LLM** (`scripts/mock-api/routes/llm.mjs`): - `/openai/v1/chat/completions` now branches on `stream: true` and emits SSE-encoded `data: {...}` chunks in OpenAI's chat-completions format, terminated by `data: [DONE]`. - Scriptable via the existing mock-behavior keys: llmStreamScript — explicit array of { text | thinking | toolCall | finish | usage | error } entries llmStreamChunkDelayMs — default delay between chunks (ms) llmForcedResponses — popped in order; auto-converted to a streaming script llmKeywordRules — matched on latest user/tool message - Tool-call streaming: a single { toolCall: {...} } entry is split into an opening chunk (id + name + first 8 chars of args) plus incremental arg fragments, matching how real OpenAI streams. - Each chunk inherits a default 25 ms delay so the UI has time to paint individual deltas; specs override per-chunk for cancel tests that need slow streaming. **Read-only introspection RPCs** (`src/openhuman/test_support/`): - `openhuman.test_support_workspace_root` — current workspace_dir + existence flag. - `openhuman.test_support_list_workspace_files` — recursive (depth- capped at 6, entry-capped at 2000) listing relative to the workspace. - `openhuman.test_support_read_workspace_file` — lossy UTF-8 read capped at 1 MiB. Rejects paths that escape the workspace root via `..` etc. - `openhuman.test_support_in_flight_chats` — snapshot of the IN_FLIGHT map in `channels::providers::web` (which chat keys are currently running and their request_id). - All gated by the same per-launch bearer the rest of `/rpc` uses; the token file is debug-only, so release builds can't be reached over the wire. Backing helper: `web::in_flight_entries_for_test()` snapshots the private IN_FLIGHT map for the introspection RPC. Read-only — no mutation, no production caller. ## Specs - **chat-harness-send-stream.spec.ts** — send a prompt, watch a 4-chunk SSE reply assemble in the DOM. Assertions: * IN_FLIGHT gains an entry mid-stream * Full reassembled assistant text lands * IN_FLIGHT drains after `chat_done` * Mock backend captured a POST with `"stream":true` * `memory/conversations/threads/<hex(thread_id)>.jsonl` contains the user prompt AND the canary assistant text * `list_workspace_files` returns at least one non-empty `.jsonl` - **chat-harness-cancel.spec.ts** — script the mock LLM to stream six chunks 500 ms apart, click Cancel mid-stream. Assertions: * IN_FLIGHT was populated, then cleared within 8 s of Cancel * Late chunks NEVER appeared in the DOM * Composer is interactive again * Persisted thread file does NOT contain late-chunk text - **chat-harness-subagent.spec.ts** — queue three forced responses: orchestrator emits `delegate_researcher` tool_call → researcher sub-agent emits a plain text answer → orchestrator emits the final synthesis. Assertions: * `chatRuntime.inferenceStatusByThread[thread].phase` flipped to `'subagent'` OR a `subagent:researcher` timeline entry appeared * Final orchestrator canary text rendered * IN_FLIGHT drained * Mock LLM saw ≥2 chat-completions hits (parent + child) * Persisted thread file records the final canary - **chat-harness-scroll-render.spec.ts** — stream a long reply that ends with bold + fenced code + a link. Assertions: * `<strong>`, `<pre><code>`, and `<a[href]>` all render with the expected canary content * Message column auto-scrolls to within 40 px of the bottom * Scrolling up half a viewport releases the auto-stick Together these four specs exercise: composer wiring, SSE transport, agent loop, sub-agent dispatch, native tool-call parsing, cancel semantics, scroll-anchor behavior, markdown rendering, redux chatRuntime mirror, the conversations JSONL store, and the IN_FLIGHT registry — all from a real UI click through to disk.

…verage

…loads ## Rust-side E2E lane The Tauri E2E lane already covers UI → core round-trips. The 10 `tests/*_e2e.rs` integration suites were running as plain `cargo test -p openhuman` in the unit-test workflow with no mock backend, which left mock-driven RPC paths uncovered the moment any suite reaches out to `BACKEND_URL`. This adds a dedicated Linux `rust-e2e-linux` job in `e2e-reusable.yml` that: 1. Boots `scripts/mock-api-server.mjs` on port 18505. 2. Exports `BACKEND_URL` + `VITE_BACKEND_URL` so any suite that wants the mock can hit it. 3. Runs all 10 e2e suites in one `cargo test --test ...` invocation — `#[ignore]` guards on `live_routing_e2e`, `subconscious_e2e`, `memory_graph_sync_e2e`, etc. are respected as documented. The runner is `scripts/test-rust-e2e.sh`, also exposed as `pnpm test:rust:e2e`. Locally and in Docker compose, both produce the same output. Smoke-validated locally on macOS-arm64 (Linux x64 emulated): 38 passing in json_rpc_e2e, 11 in linux_cef_deb_runtime_e2e, 11 in screen_intelligence_vision_e2e, the canary in memory_roundtrip_e2e, and so on. ## Artifact-upload hygiene `e2e-reusable.yml` is invoked from `release-staging.yml` and `release-production.yml` pretest gates, not just PR CI. Uploaded artifacts under that workflow are reachable from a release's job detail page and live for 7 days. Mock-backend payload logs, appium driver transcripts, and CDP event dumps can echo env vars, auth tokens, and provider URLs — none of which we want pinned to a release artifact. This commit drops all four `Upload e2e artifacts` steps (one per OS lane) and the rust-e2e mock-log upload. Local Docker repro is unaffected — the same files still land in `/tmp` and `app/test/e2e/artifacts/` inside the container. ## Misc The chat-harness specs from `eab8430e` plus `src/openhuman/test_support/introspect.rs` picked up prettier / rustfmt formatting touch-ups from the pre-push hook; included here so the branch lands clean.

coderabbitai · 2026-05-16T04:48:26Z

Warning

Rate limit exceeded

@senamakel has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a22ff2f1-c7f6-49d8-b2b3-4d15c7361d50

📥 Commits

Reviewing files that changed from the base of the PR and between 47aff08 and 448ef0e.

📒 Files selected for processing (2)

.github/workflows/e2e-reusable.yml
app/test/e2e/specs/smoke.spec.ts

📝 Walkthrough

Walkthrough

Adds SSE streaming to the mock LLM, introduces Rust test-support introspection RPCs for workspace and in-flight state, provides a Bash runner and CI job for Rust E2E, adds shared browser helpers, and implements four Playwright E2E specs covering cancel, markdown rendering/scroll, send+stream, and subagent delegation.

Changes

E2E Test Suite and Infrastructure

Layer / File(s)	Summary
Mock LLM SSE streaming foundation `scripts/mock-api/routes/llm.mjs`	Implements OpenAI-compatible SSE streaming for chat completions, script selection (llmStreamScript / llmForcedResponses / llmKeywordRules / fallback), chunked text and tool-call fragments, mid-stream error events, and terminates streams with `data: [DONE]`. Routes `parsedBody.stream === true` requests to the streaming handler.
Rust test-support introspection RPC module `src/openhuman/channels/providers/web.rs`, `src/openhuman/test_support/introspect.rs`, `src/openhuman/test_support/mod.rs`, `src/openhuman/test_support/schemas.rs`	Adds read-only RPCs: `workspace_root`, `list_workspace_files` (depth + entry cap, truncation flag), `read_workspace_file` (max bytes, lossy UTF‑8, returned_bytes), and `in_flight_chats` (snapshot of in‑flight entries). Includes secure workspace-relative path resolution and RPC schema wiring.
Test orchestration and CI integration `package.json`, `scripts/test-rust-e2e.sh`, `.github/workflows/e2e-reusable.yml`	Adds `test:rust:e2e` npm script and `scripts/test-rust-e2e.sh` (mock backend lifecycle, health checks, suite filtering, cargo invocation). Updates CI reusable workflow to add a Linux Rust E2E job and remove artifact upload steps across platforms.
Shared chat-harness E2E helpers `app/test/e2e/helpers/chat-harness.ts`	Adds browser helper utilities: `clickByTitle`, `typeIntoComposer` (native setter + input event), `clickSend`, `getSelectedThreadId` (reads global store), and `hexEncodeThreadId` for workspace file path derivation.
Chat cancellation mid-stream E2E spec `app/test/e2e/specs/chat-harness-cancel.spec.ts`	Tests mid-stream cancel behavior: waits for in-flight, confirms early chunk rendered, clicks composer Cancel (non-modal), ensures in-flight clears, late chunks never appear in DOM, composer remains usable, and persisted thread JSONL does not include late chunks.
Chat markdown rendering and auto-scroll E2E spec `app/test/e2e/specs/chat-harness-scroll-render.spec.ts`	Streams markdown with bold/code/link canaries and filler to force overflow; verifies rendered `strong`, `pre code`, and `a[href]` elements, auto-scroll-to-bottom after stream, and release of auto-stick when user scrolls up.
Chat send + streaming message E2E spec `app/test/e2e/specs/chat-harness-send-stream.spec.ts`	Validates send+stream: sends a prompt, polls `in_flight_chats` during stream, waits for complete assistant reply in DOM, asserts backend received `stream:true` request, and checks persisted thread JSONL contains prompt and canary.
Orchestrator-to-subagent delegation E2E spec `app/test/e2e/specs/chat-harness-subagent.spec.ts`	Forces orchestrator→subagent sequence via `llmForcedResponses`, snapshots runtime to detect `subagent` phase, verifies multiple chat-completions POSTs (parent + subagent), and confirms persisted thread JSONL includes final canary.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

tinyhumansai/openhuman#1859: Extends the E2E test_support RPC surface (adds/reset controller plumbing) and overlaps controller/schema wiring.
tinyhumansai/openhuman#1887: Modifies the same reusable E2E workflow file and CI job/artifact handling changes.

Poem

🐰 Streams hop in gentle flow,
Mock replies in careful show,
Cancel and scroll with tests that gleam,
Workspaces store each streaming dream,
Rabbits clap — E2E on team!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: adding deep chat-harness E2E coverage, implementing streaming mock LLM support, and creating a Rust E2E Linux lane for CI.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (6)

app/test/e2e/specs/chat-harness-scroll-render.spec.ts (1)
133-225: ⚡ Quick win

Add failure diagnostics hooks for faster CI triage.

This spec does not currently emit request logs or accessibility dumps on failures. Add per-test failure diagnostics (dumpAccessibilityTree() plus request-log capture) so agent debugging is actionable.

As per coding guidelines: "Ensure each E2E spec is runnable in isolation and add failure diagnostics (request logs, dumpAccessibilityTree()) for faster debugging by agents".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/chat-harness-scroll-render.spec.ts` around lines 133 -
225, Add a per-test failure diagnostics afterEach inside the describe('Chat
harness — scroll + markdown render') block that, when a test fails, calls
dumpAccessibilityTree() and captures request logs via the test harness
request-log API (e.g., captureRequestLogs() or equivalent) and persists them
(timestamped) for CI triage; place this afterEach alongside the existing
before/after that use startMockServer() and stopMockServer(), and ensure it runs
for every it(...) so failing tests emit both the accessibility dump and saved
request logs.
app/test/e2e/specs/chat-harness-cancel.spec.ts (1)
139-233: ⚡ Quick win

Add failure diagnostics hook for faster triage.

Please add an afterEach failure path that captures request logs and dumpAccessibilityTree() output when a test fails.

As per coding guidelines: app/test/e2e/specs/**/*.spec.ts: Ensure each E2E spec is runnable in isolation and add failure diagnostics (request logs, dumpAccessibilityTree()).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/chat-harness-cancel.spec.ts` around lines 139 - 233, Add
an afterEach hook inside the describe('Chat harness — mid-stream cancel', ...)
block that checks whether the test failed (this.currentTest?.state === 'failed')
and, on failure, calls dumpAccessibilityTree() and a helper to capture request
logs (e.g., captureRequestLogs() or getRequestLogs()) and persists them as
artifacts for triage; place the hook near the existing before/after hooks so it
runs for every it() in this spec and reference existing symbols like
dumpAccessibilityTree, clickComposerCancel, and inFlightCount when locating the
describe block to modify.
app/test/e2e/specs/chat-harness-send-stream.spec.ts (1)
105-123: ⚡ Quick win

Add failure diagnostics hook for faster triage in CI.

This spec currently has no per-test failure dump (request log + accessibility tree), which slows root-cause analysis when flakes occur.
Proposed refactor
-import { textExists } from '../helpers/element-helpers';
+import { dumpAccessibilityTree, textExists } from '../helpers/element-helpers';
@@
   after(async () => {
     setMockBehavior('llmStreamScript', '');
     await stopMockServer();
   });
+
+  afterEach(async function () {
+    if (this.currentTest?.state === 'failed') {
+      console.error(
+        '[chat-harness-send-stream] request log:\n' +
+          JSON.stringify(getRequestLog(), null, 2)
+      );
+      await dumpAccessibilityTree();
+    }
+  });
As per coding guidelines: “Ensure each E2E spec is runnable in isolation and add failure diagnostics (request logs, dumpAccessibilityTree()) for faster debugging by agents”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/chat-harness-send-stream.spec.ts` around lines 105 - 123,
Add a per-test failure diagnostics hook inside the describe('Chat harness — send
+ stream') block (e.g., an afterEach or test failure handler) that runs when a
test fails and calls the existing diagnostics helpers to dump request logs and
the accessibility tree; specifically invoke functions like dumpRequestLog() and
dumpAccessibilityTree() and ensure any test-specific mocks set by
setMockBehavior('llmStreamScript', ...) are left in a stable state or reset so
CI runs remain isolated; place this alongside the existing before/after setup
that uses startMockServer(), waitForApp(), resetApp(USER_ID), and
stopMockServer() so individual test failures produce the request log +
accessibility tree for faster triage.
app/test/e2e/specs/chat-harness-subagent.spec.ts (2)
176-240: ⚡ Quick win

Add failure diagnostics for faster debugging.

Per the coding guidelines, E2E specs should "add failure diagnostics (request logs, dumpAccessibilityTree()) for faster debugging by agents." Consider adding diagnostics in a try-catch block or after timeout failures to capture the request log and DOM state when assertions fail.
📊 Suggested diagnostic additions
     // Final canary must land in the DOM after the orchestrator wraps up.
-    await browser.waitUntil(async () => await textExists(CANARY_FINAL), {
-      timeout: 30_000,
-      timeoutMsg: 'orchestrator never produced the final canary text',
-    });
+    try {
+      await browser.waitUntil(async () => await textExists(CANARY_FINAL), {
+        timeout: 30_000,
+        timeoutMsg: 'orchestrator never produced the final canary text',
+      });
+    } catch (err) {
+      console.error('Request log:', getRequestLog());
+      console.error('Runtime snapshot:', await snapshotRuntime(threadId));
+      throw err;
+    }
As per coding guidelines: "Ensure each E2E spec is runnable in isolation and add failure diagnostics (request logs, dumpAccessibilityTree()) for faster debugging by agents."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/chat-harness-subagent.spec.ts` around lines 176 - 240, The
test lacks failure diagnostics; wrap the orchestrator E2E flow (the
'orchestrator delegates to researcher...' it block) in a try/catch/finally (or
add catch hooks) so that on any assertion/timeouts you call
dumpAccessibilityTree() and capture the network/request log (via your existing
request-log helper or browser.getRequests()) along with any relevant runtime
snapshot (snapshotRuntime(threadId)) and RPC state (callOpenhumanRpc) before
rethrowing the error; add these diagnostics calls referencing the test block,
dumpAccessibilityTree(), snapshotRuntime(threadId), and callOpenhumanRpc(...) so
failures produce DOM + request + runtime/RPC dumps for faster debugging.
70-111: ⚖️ Poor tradeoff

Extract these duplicated helpers to element-helpers.ts with cross-platform support, or refactor selectors to work with existing helpers.

These three helpers (clickByTitle, typeIntoComposer, clickSend) are duplicated identically across four spec files (chat-harness-subagent, chat-harness-send-stream, chat-harness-cancel, chat-harness-scroll-render). More importantly, they use browser.execute() with document.querySelector, which only works on tauri-driver and violates the coding guideline to use cross-platform helpers from element-helpers.ts.

However, element-helpers.ts doesn't currently support:

Clicking by title attribute (clickByTitle uses button[title=...])

Clicking by aria-label attribute (clickSend uses button[aria-label="Send message"])

Typing into textarea[placeholder="..."] (no textarea input helper exists)

Options:

Extend element-helpers.ts to support these selector patterns with cross-platform implementations (Mac2 + tauri-driver)

Refactor the UI to use text content or role attributes that clickButton() and clickText() already support

Choose option 1 if these selectors are intentional; choose option 2 if the UI can be simplified. Either way, remove the duplication across the four chat-harness specs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/chat-harness-subagent.spec.ts` around lines 70 - 111, The
three duplicated helpers (clickByTitle, typeIntoComposer, clickSend) must be
extracted into element-helpers.ts and implemented in a cross-platform way
(supporting both Mac2 and tauri-driver) rather than using
browser.execute(document.querySelector). Add exported functions
clickByTitle(title: string), clickSend(), and typeIntoComposer(text: string) in
element-helpers.ts that use the project's cross-platform element interaction
primitives (or add driver-branching logic inside these helpers) to locate
button[title=...], button[aria-label="Send message"] and
textarea[placeholder="Type a message..."] and perform click/value+input events;
replace the in-file implementations in chat-harness-subagent.spec.ts and the
other three chat-harness specs with imports from element-helpers.ts and remove
the duplicated code there.
src/openhuman/test_support/schemas.rs (1)
133-216: ⚡ Quick win

Keep the new schema metadata aligned with the actual RPC payloads.

list_workspace_files always returns entries, read_workspace_file always returns rel_path/size_on_disk/returned_bytes, and in_flight_chats.result.entries is an array of objects, not a string. Right now the registered contract is lying about what these handlers emit.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/test_support/schemas.rs` around lines 133 - 216, The schema
definitions for the controllers are incorrect: update the ControllerSchema
entries for "list_workspace_files", "read_workspace_file", and "in_flight_chats"
so their outputs match actual RPC payloads — in "list_workspace_files" ensure
the output object includes an "entries" field (array of entry objects) rather
than omitting entries; in "read_workspace_file" replace the current outputs with
fields "rel_path", "size_on_disk", and "returned_bytes" (plus the content field
if present) and mark required appropriately; and in "in_flight_chats" change
result.entries to be an array of objects (not TypeSchema::String) describing {
key, request_id } entries; modify the TypeSchema and FieldSchema usages
accordingly so the registered contract matches the real emitted shapes
referenced by the functions named list_workspace_files, read_workspace_file, and
in_flight_chats.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/test/e2e/specs/chat-harness-cancel.spec.ts`:
- Around line 53-113: Replace the ad-hoc browser.execute implementations in
clickByTitle, typeIntoComposer, clickSend, and clickComposerCancel with the
corresponding shared helpers from element-helpers.ts: import and call the
centralized click/type functions (use the helper that supports title-based
clicks, textarea typing with input event dispatch, send-button click that checks
disabled state, and a cancel-button click that excludes modal/dialog ancestors)
and preserve existing behavior (timeout loop in clickByTitle, JSON-stringified
title matching, input setter+dispatch in typeIntoComposer, disabled check in
clickSend, and closest('[role="dialog"], [aria-modal="true"]') exclusion in
clickComposerCancel). Ensure signatures remain the same so tests calling these
functions need no changes.
- Around line 194-202: The post-cancel observation window is too short; increase
the wait after cancel so uncancelled late chunks are reliably observed: replace
the fixed 1_500 in the browser.pause call with a longer duration (or compute it
from your chunk timing constants) so it covers the remaining scripted stream
time, then keep the loop verifying LATE_PIECES via textExists to assert they are
absent; update the browser.pause(1_500) invocation used in this test to a safely
larger value or a computed value based on your stream chunk interval.
- Around line 205-213: The test "after cancel, send button is interactive again
(composer not stuck)" currently only checks the textarea's enabled state; update
the spec to also locate the send button (e.g., querySelector for the send button
element used in the app — for example a button with an aria-label or class like
button[aria-label="Send"] or .send-button) and assert it is enabled/not disabled
and/or clickable after cancel. In the same test (the it block), after the
existing textarea check that uses the textarea[placeholder="Type a message..."]
selector, add a browser.execute check that returns the send button element's
presence and enabled state (e.g., !!btn && !btn.disabled) and add an expect to
verify it is true so the test validates both composer and send-button recovery.

In `@app/test/e2e/specs/chat-harness-scroll-render.spec.ts`:
- Around line 149-225: The tests in this spec are state-coupled because the
streaming `it('streams a long markdown reply')` produces messages relied on by
later `it` blocks; make each `it` independent by moving the setup that creates a
new thread and sends the markdown prompt into a shared beforeEach (or by merging
dependent asserts into a single `it`) — use the existing helpers
(navigateViaHash, clickByTitle, typeIntoComposer, clickSend) to create the
thread and wait for CANARY_BOLD and CANARY_CODE in beforeEach so each test can
safely query the DOM; also add failure diagnostics (call dumpAccessibilityTree()
and capture request logs) on waitUntil timeouts to help debugging; update
references in tests that inspect DOM (the browser.execute block using
CANARY_BOLD/CANARY_CODE/LINK_URL and the scroll tests using scrollMetrics and
scrollMessageColumn) to rely on the per-test seeded state.
- Around line 60-101: The custom DOM helpers clickByTitle, typeIntoComposer, and
clickSend must be removed and replaced with the shared E2E utilities from
element-helpers.ts: import and use clickNativeButton(...) for any button clicks
(replacing clickByTitle and clickSend), and use the shared composer input
helpers from element-helpers (the helper that sets/types composer text and the
helper that triggers send) instead of typeIntoComposer; update all calls to
reference these utilities and remove the inline DOM-manipulation functions (keep
references to clickByTitle, typeIntoComposer, clickSend to locate changes).

In `@app/test/e2e/specs/chat-harness-send-stream.spec.ts`:
- Around line 188-194: The test currently asserts that
callOpenhumanRpc('openhuman.test_support_in_flight_chats') returns an empty
global entries array, which is over-scoped; instead, call the same RPC result
and filter its result.result.entries by the current thread identifier (e.g.,
threadId or threadKey used earlier in this spec) and assert that the filtered
list is empty for that thread only; update the assertion that now checks
something like entries.filter(e => e.key === <currentThreadKey>).toEqual([]) so
unrelated in-flight chats don't cause flakiness (use the actual variable name
used in this test for the thread identifier when implementing the filter).

In `@app/test/e2e/specs/chat-harness-subagent.spec.ts`:
- Line 1: Remove the file-level `@ts-nocheck` at the top of
chat-harness-subagent.spec.ts to restore TypeScript checking; run tsc or your
editor to see any resulting type errors and fix them individually (use targeted
`@ts-expect-error` with a short explanation only where a true false-positive or
external typing gap exists), updating any incorrect types in test helpers,
mocks, or calls inside the describe/it blocks so the test compiles under
TypeScript rather than silencing all checks.

In `@scripts/mock-api/routes/llm.mjs`:
- Around line 102-120: defaultStreamScript currently drops the top-level content
whenever toolCalls are present and emits tool-call deltas without per-call
indexing, which causes loss of the initial preamble and collapses multiple tool
calls; update defaultStreamScript to emit the content as the first script entry
(preserve { content } before any toolCall entries) and then emit each toolCall
entry, and modify the toolCall delta shape to include a unique index/sequence
field (e.g., index: i or callIndex) so streamScriptToResponse can reconstruct
separate tool-call streams; ensure you also add corresponding finish markers
(finish: "tool_calls" or per-call finishes) consistent with
streamScriptToResponse’s expectations.

In `@scripts/test-rust-e2e.sh`:
- Around line 50-55: The --suite case in the while/case loop reads $2 without
checking it, which breaks with set -u; update the --suite branch in the loop
(the while [ $# -gt 0 ] ... case "$1" ... --suite) to validate that a second
argument exists (e.g., check $# -lt 2 or that $2 is non-empty/not another flag)
before appending to SUITES, and if missing print a clear usage/error and exit 1
instead of reading an unbound variable.

In `@src/openhuman/test_support/introspect.rs`:
- Around line 213-219: The reported returned_bytes should be the raw buffer
length, not the lossy UTF-8 string length; update the construction of
ReadFileResult in RpcOutcome::single_log to use returned.len() as returned_bytes
(cast to u64) instead of content_utf8.len(), while keeping the
String::from_utf8_lossy(&returned).into_owned() conversion for the content
display.
- Around line 121-133: The walker currently calls entry.metadata().await which
follows symlinks; change it to call entry.symlink_metadata().await, check the
resulting file_type().is_symlink() and skip (continue) if true so symlinks are
not reported or recursed into; then use that metadata to decide is_dir/size and
proceed with pushing directories onto the stack (apply this change in the
workspace listing function such as list_workspace_files where ListEntry is
constructed and entry.metadata().await is used).

---

Nitpick comments:
In `@app/test/e2e/specs/chat-harness-cancel.spec.ts`:
- Around line 139-233: Add an afterEach hook inside the describe('Chat harness —
mid-stream cancel', ...) block that checks whether the test failed
(this.currentTest?.state === 'failed') and, on failure, calls
dumpAccessibilityTree() and a helper to capture request logs (e.g.,
captureRequestLogs() or getRequestLogs()) and persists them as artifacts for
triage; place the hook near the existing before/after hooks so it runs for every
it() in this spec and reference existing symbols like dumpAccessibilityTree,
clickComposerCancel, and inFlightCount when locating the describe block to
modify.

In `@app/test/e2e/specs/chat-harness-scroll-render.spec.ts`:
- Around line 133-225: Add a per-test failure diagnostics afterEach inside the
describe('Chat harness — scroll + markdown render') block that, when a test
fails, calls dumpAccessibilityTree() and captures request logs via the test
harness request-log API (e.g., captureRequestLogs() or equivalent) and persists
them (timestamped) for CI triage; place this afterEach alongside the existing
before/after that use startMockServer() and stopMockServer(), and ensure it runs
for every it(...) so failing tests emit both the accessibility dump and saved
request logs.

In `@app/test/e2e/specs/chat-harness-send-stream.spec.ts`:
- Around line 105-123: Add a per-test failure diagnostics hook inside the
describe('Chat harness — send + stream') block (e.g., an afterEach or test
failure handler) that runs when a test fails and calls the existing diagnostics
helpers to dump request logs and the accessibility tree; specifically invoke
functions like dumpRequestLog() and dumpAccessibilityTree() and ensure any
test-specific mocks set by setMockBehavior('llmStreamScript', ...) are left in a
stable state or reset so CI runs remain isolated; place this alongside the
existing before/after setup that uses startMockServer(), waitForApp(),
resetApp(USER_ID), and stopMockServer() so individual test failures produce the
request log + accessibility tree for faster triage.

In `@app/test/e2e/specs/chat-harness-subagent.spec.ts`:
- Around line 176-240: The test lacks failure diagnostics; wrap the orchestrator
E2E flow (the 'orchestrator delegates to researcher...' it block) in a
try/catch/finally (or add catch hooks) so that on any assertion/timeouts you
call dumpAccessibilityTree() and capture the network/request log (via your
existing request-log helper or browser.getRequests()) along with any relevant
runtime snapshot (snapshotRuntime(threadId)) and RPC state (callOpenhumanRpc)
before rethrowing the error; add these diagnostics calls referencing the test
block, dumpAccessibilityTree(), snapshotRuntime(threadId), and
callOpenhumanRpc(...) so failures produce DOM + request + runtime/RPC dumps for
faster debugging.
- Around line 70-111: The three duplicated helpers (clickByTitle,
typeIntoComposer, clickSend) must be extracted into element-helpers.ts and
implemented in a cross-platform way (supporting both Mac2 and tauri-driver)
rather than using browser.execute(document.querySelector). Add exported
functions clickByTitle(title: string), clickSend(), and typeIntoComposer(text:
string) in element-helpers.ts that use the project's cross-platform element
interaction primitives (or add driver-branching logic inside these helpers) to
locate button[title=...], button[aria-label="Send message"] and
textarea[placeholder="Type a message..."] and perform click/value+input events;
replace the in-file implementations in chat-harness-subagent.spec.ts and the
other three chat-harness specs with imports from element-helpers.ts and remove
the duplicated code there.

In `@src/openhuman/test_support/schemas.rs`:
- Around line 133-216: The schema definitions for the controllers are incorrect:
update the ControllerSchema entries for "list_workspace_files",
"read_workspace_file", and "in_flight_chats" so their outputs match actual RPC
payloads — in "list_workspace_files" ensure the output object includes an
"entries" field (array of entry objects) rather than omitting entries; in
"read_workspace_file" replace the current outputs with fields "rel_path",
"size_on_disk", and "returned_bytes" (plus the content field if present) and
mark required appropriately; and in "in_flight_chats" change result.entries to
be an array of objects (not TypeSchema::String) describing { key, request_id }
entries; modify the TypeSchema and FieldSchema usages accordingly so the
registered contract matches the real emitted shapes referenced by the functions
named list_workspace_files, read_workspace_file, and in_flight_chats.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b7f9cbba-31cf-4b2b-89af-e339f484d07c

📥 Commits

Reviewing files that changed from the base of the PR and between f90a337 and 481ef0e.

📒 Files selected for processing (12)

.github/workflows/e2e-reusable.yml
app/test/e2e/specs/chat-harness-cancel.spec.ts
app/test/e2e/specs/chat-harness-scroll-render.spec.ts
app/test/e2e/specs/chat-harness-send-stream.spec.ts
app/test/e2e/specs/chat-harness-subagent.spec.ts
package.json
scripts/mock-api/routes/llm.mjs
scripts/test-rust-e2e.sh
src/openhuman/channels/providers/web.rs
src/openhuman/test_support/introspect.rs
src/openhuman/test_support/mod.rs
src/openhuman/test_support/schemas.rs

11 inline comments — all resolved. ## Spec changes - New `app/test/e2e/helpers/chat-harness.ts` factors out the four helpers that each chat-harness spec was redefining: clickByTitle, typeIntoComposer, clickSend, getSelectedThreadId, hexEncodeThreadId. (CodeRabbit comments on cancel + scroll-render.) - `chat-harness-send-stream.spec.ts`: scope the post-stream IN_FLIGHT drain check to the current thread (`key.endsWith("::<threadId>")`) instead of asserting the entire map is empty — was over-scoped and would flake on unrelated background work. - `chat-harness-cancel.spec.ts`: * Extended the post-cancel observation window from 1.5 s to 3.5 s so all six 500 ms-spaced chunks have had time to land if cancel silently failed. * The "send button interactive again" test now exercises BOTH sides of the contract: the textarea AND the send button are enabled (typing a fresh prompt and asserting `!btn.disabled`). The previous version only checked the textarea, which would pass while the send button stayed stuck `disabled` because of a non-reset `isSending` flag. - `chat-harness-subagent.spec.ts`: dropped `@ts-nocheck`. Typecheck passes clean. - `chat-harness-scroll-render.spec.ts`: collapsed four state-coupled `it()` blocks into one. The four assertions (stream completes, markdown renders, auto-scroll anchored, scroll-up releases) are all facts about the same exchange, so splitting them across Mocha tests just made each one rely on its predecessor's side effects. Keeping the asserts in one `it` removes the cascading-false-negative shape CodeRabbit flagged. ## Mock LLM - `defaultStreamScript` no longer drops a `content` preamble when `toolCalls` are present — preserves the non-streaming `{ content, toolCalls }` contract. - Tool-call deltas now carry the caller-supplied `index` (defaulting to 0 only when unset) so multi-tool responses don't collapse into one reconstructed call on the client side. ## Rust introspection - `list_workspace_files` now uses `fs::symlink_metadata` and skips symlinks. Following symlinks could break the workspace-only guarantee — a symlink pointing outside the workspace would be reported and, if it resolved to a directory, recursed into. - `read_workspace_file` reports `returned_bytes` from the raw byte buffer instead of from the post-`from_utf8_lossy` string. The lossy conversion substitutes U+FFFD for invalid sequences and can change the byte count, so the previous value could exceed `max_bytes` for non-UTF-8 files. ## Runner - `scripts/test-rust-e2e.sh` now emits a clear usage error when `--suite` is passed without a value, instead of dying on `$2: unbound variable` under `set -u`.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

scripts/test-rust-e2e.sh (1)

56-61: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject flag-like values for --suite.

Line 56 only blocks missing/empty $2; --suite -- --ignored is still accepted, so -- is treated as a suite name and fails later in a confusing way. Validate that the suite token is not another flag.

Suggested fix

-      if [ $# -lt 2 ] || [ -z "${2:-}" ]; then
+      if [ $# -lt 2 ] || [ -z "${2:-}" ] || [[ "${2:-}" == -* ]]; then
         echo "[rust-e2e] ERROR: --suite requires a test name (e.g. --suite json_rpc_e2e)" >&2
         exit 2
       fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test-rust-e2e.sh` around lines 56 - 61, The argument parsing
currently adds "$2" to SUITES even when it's another flag (e.g. "--suite --
--ignored"); update the validation where you check if [ $# -lt 2 ] || [ -z
"${2:-}" ] to also reject suite tokens that look like flags by testing if "$2"
begins with a hyphen (e.g. case "$2" in -*) and if so print the same error and
exit 2; ensure you still add valid suite names to SUITES and shift 2 as before.

🧹 Nitpick comments (2)

app/test/e2e/specs/chat-harness-subagent.spec.ts (1)
105-220: ⚡ Quick win

Add failure diagnostics for this spec’s assertions.

Please add failure-time diagnostics (e.g., request-log snapshot and accessibility tree dump) to speed up root-cause analysis when this flow flakes.

As per coding guidelines: app/test/e2e/specs/**/*.spec.ts: Ensure each E2E spec is runnable in isolation and add failure diagnostics (request logs, dumpAccessibilityTree()) for faster debugging by agents.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/chat-harness-subagent.spec.ts` around lines 105 - 220, Add
failure-time diagnostics and isolation safeguards: ensure the spec logs request
snapshots and the accessibility tree when a test fails by adding an afterEach
hook that checks the test state and on failure calls getRequestLog() (or
snapshot and serialize it) and awaits dumpAccessibilityTree(), and include any
relevant callOpenhumanRpc() file-read snapshots (e.g., the thread file read) to
the logged output; place this afterEach near the top-level describe so it runs
for each it block and update any shared setup/teardown (before/after) to be safe
when the spec runs in isolation.
app/test/e2e/specs/chat-harness-cancel.spec.ts (1)
88-198: ⚡ Quick win

Add failure diagnostics hooks for faster triage.

This spec currently lacks failure-time diagnostics (request logs/accessibility dump), which makes agent debugging slower.

As per coding guidelines: app/test/e2e/specs/**/*.spec.ts: Ensure each E2E spec is runnable in isolation and add failure diagnostics (request logs, dumpAccessibilityTree()) for faster debugging by agents.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/chat-harness-cancel.spec.ts` around lines 88 - 198, The
spec lacks failure-time diagnostics; add an afterEach hook in this spec (near
the existing before/after blocks) that checks test failure
(this.currentTest?.state === 'failed') and on failure calls
dumpAccessibilityTree() and the project’s request-log retrieval routine (e.g.,
saveRequestLogs()/collectRequestLogs() or whichever helper interacts with
startMockServer()/stopMockServer()) to persist network/request logs and the
accessibility tree for triage. Ensure the hook is async, guarded to only run on
failure, and writes diagnostics to the test output/artifacts so agents can
access them later.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/test/e2e/specs/chat-harness-cancel.spec.ts`:
- Line 1: Remove the file-level "`@ts-nocheck`" from chat-harness-cancel.spec.ts,
run "pnpm typecheck", and fix all surfaced TypeScript errors with targeted fixes
or minimal, justified suppressions (no broad `@ts-nocheck`). Also add failure
diagnostics by invoking dumpAccessibilityTree() in all failure paths for this
spec — e.g., inside catch blocks and in any timeoutMsg handlers used by tests
(match patterns in gmail-flow.spec.ts or conversations-web-channel-flow.spec.ts
for placement), so that on test timeouts or exceptions the accessibility tree is
dumped for debugging.

---

Duplicate comments:
In `@scripts/test-rust-e2e.sh`:
- Around line 56-61: The argument parsing currently adds "$2" to SUITES even
when it's another flag (e.g. "--suite -- --ignored"); update the validation
where you check if [ $# -lt 2 ] || [ -z "${2:-}" ] to also reject suite tokens
that look like flags by testing if "$2" begins with a hyphen (e.g. case "$2" in
-*) and if so print the same error and exit 2; ensure you still add valid suite
names to SUITES and shift 2 as before.

---

Nitpick comments:
In `@app/test/e2e/specs/chat-harness-cancel.spec.ts`:
- Around line 88-198: The spec lacks failure-time diagnostics; add an afterEach
hook in this spec (near the existing before/after blocks) that checks test
failure (this.currentTest?.state === 'failed') and on failure calls
dumpAccessibilityTree() and the project’s request-log retrieval routine (e.g.,
saveRequestLogs()/collectRequestLogs() or whichever helper interacts with
startMockServer()/stopMockServer()) to persist network/request logs and the
accessibility tree for triage. Ensure the hook is async, guarded to only run on
failure, and writes diagnostics to the test output/artifacts so agents can
access them later.

In `@app/test/e2e/specs/chat-harness-subagent.spec.ts`:
- Around line 105-220: Add failure-time diagnostics and isolation safeguards:
ensure the spec logs request snapshots and the accessibility tree when a test
fails by adding an afterEach hook that checks the test state and on failure
calls getRequestLog() (or snapshot and serialize it) and awaits
dumpAccessibilityTree(), and include any relevant callOpenhumanRpc() file-read
snapshots (e.g., the thread file read) to the logged output; place this
afterEach near the top-level describe so it runs for each it block and update
any shared setup/teardown (before/after) to be safe when the spec runs in
isolation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6d8882cc-1218-4969-9f17-ef91c72b5c20

📥 Commits

Reviewing files that changed from the base of the PR and between 481ef0e and f1098ef.

📒 Files selected for processing (8)

app/test/e2e/helpers/chat-harness.ts
app/test/e2e/specs/chat-harness-cancel.spec.ts
app/test/e2e/specs/chat-harness-scroll-render.spec.ts
app/test/e2e/specs/chat-harness-send-stream.spec.ts
app/test/e2e/specs/chat-harness-subagent.spec.ts
scripts/mock-api/routes/llm.mjs
scripts/test-rust-e2e.sh
src/openhuman/test_support/introspect.rs

🚧 Files skipped from review as they are similar to previous changes (4)

app/test/e2e/specs/chat-harness-scroll-render.spec.ts
app/test/e2e/specs/chat-harness-send-stream.spec.ts
scripts/mock-api/routes/llm.mjs
src/openhuman/test_support/introspect.rs

coderabbitai · 2026-05-16T05:10:56Z

@@ -0,0 +1,198 @@
+// @ts-nocheck


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cd /tmp/repo 2>/dev/null || cd . && find . -name "chat-harness-cancel.spec.ts" -type f

Repository: tinyhumansai/openhuman

Length of output: 115

🏁 Script executed:

# Check if the file exists and read its content wc -l ./app/test/e2e/specs/chat-harness-cancel.spec.ts

Repository: tinyhumansai/openhuman

Length of output: 119

🏁 Script executed:

# Read the first 50 lines to confirm `@ts-nocheck` and check structure head -50 ./app/test/e2e/specs/chat-harness-cancel.spec.ts

Repository: tinyhumansai/openhuman

Length of output: 2095

🏁 Script executed:

# Check for TypeScript configuration and available element-helpers ls -la ./app/test/e2e/ | grep -E "(helpers|tsconfig)"

Repository: tinyhumansai/openhuman

Length of output: 123

🏁 Script executed:

# Look for the element-helpers.ts file find . -name "element-helpers.ts" -type f

Repository: tinyhumansai/openhuman

Length of output: 108

🏁 Script executed:

# Check element-helpers for available diagnostic functions cat -n ./app/test/e2e/helpers/element-helpers.ts | head -100

Repository: tinyhumansai/openhuman

Length of output: 4366

🏁 Script executed:

# Read the full spec to check for failure diagnostics and helper usage cat -n ./app/test/e2e/specs/chat-harness-cancel.spec.ts | tail -150

Repository: tinyhumansai/openhuman

Length of output: 6726

🏁 Script executed:

# Check TypeScript configuration to understand what errors might surface find . -name "tsconfig*.json" -type f | head -5

Repository: tinyhumansai/openhuman

Length of output: 195

🏁 Script executed:

# Check if running typecheck would flag the file rg "`@ts-nocheck`" ./app/test/e2e/specs/ --count-matches

Repository: tinyhumansai/openhuman

Length of output: 1829

🏁 Script executed:

# Check if dumpAccessibilityTree exists in element-helpers rg "dumpAccessibilityTree" ./app/test/e2e/

Repository: tinyhumansai/openhuman

Length of output: 5926

🏁 Script executed:

# Check how other E2E specs use failure diagnostics grep -r "dumpAccessibilityTree\|request.*log" ./app/test/e2e/specs/ | head -20

Repository: tinyhumansai/openhuman

Length of output: 2076

🏁 Script executed:

# Check tsconfig.e2e.json to understand the TypeScript setup cat ./app/test/tsconfig.e2e.json

Repository: tinyhumansai/openhuman

Length of output: 422

🏁 Script executed:

# Check a few other E2E specs to see if they have `@ts-nocheck` or if they use diagnostics differently head -20 ./app/test/e2e/specs/smoke.spec.ts

Repository: tinyhumansai/openhuman

Length of output: 883

Remove file-level @ts-nocheck and add failure diagnostics.

The @ts-nocheck suppresses all TypeScript checking, masking contract/type regressions. After removal, run pnpm typecheck and fix any surfaced issues with targeted, justified suppressions only where strictly necessary.

Additionally, the spec lacks failure diagnostics. Per E2E guideline, add dumpAccessibilityTree() calls in failure handlers (e.g., catch blocks or timeoutMsg paths) for faster debugging. Examples exist in other specs like gmail-flow.spec.ts and conversations-web-channel-flow.spec.ts.

Remove `@ts-nocheck`

-// `@ts-nocheck` /**

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/test/e2e/specs/chat-harness-cancel.spec.ts` at line 1, Remove the file-level "`@ts-nocheck`" from chat-harness-cancel.spec.ts, run "pnpm typecheck", and fix all surfaced TypeScript errors with targeted fixes or minimal, justified suppressions (no broad `@ts-nocheck`). Also add failure diagnostics by invoking dumpAccessibilityTree() in all failure paths for this spec — e.g., inside catch blocks and in any timeoutMsg handlers used by tests (match patterns in gmail-flow.spec.ts or conversations-web-channel-flow.spec.ts for placement), so that on test timeouts or exceptions the accessibility tree is dumped for debugging.

The Chromium-driver smoke run was failing on `#/home` vs `#/`: Expected pattern: /^#\/home/ Received string: "#/" Same failure on main run 25952893380 four hours before this PR ran — it's a pre-existing race in the resetApp() onboarding walk, not a regression from the chat-harness work. The walker hits "Onboarding next-button never appeared" and bails before the overlay finishes mounting, so the renderer either lingers on the unauthenticated welcome screen or sits on /onboarding/*. Fix the smoke assertion to lock down what smoke is really for — "the harness boots past the unauthenticated welcome screen" — rather than the stricter "lands specifically on /home after onboarding completed". The latter assertion is the contract of login-flow.spec.ts / logout-relogin-onboarding.spec.ts; smoke should only catch the harness-itself failure mode (auth deep-link didn't wire up, /core not reachable, etc.). Also poll the hash inside a `waitUntil` instead of reading it once. The original read fired immediately after `waitForAppReady`, which can race with the auth handler's hash-mutation. Polling absorbs that without adding extra wall time when the renderer is fast.

coderabbitai

🧹 Nitpick comments (1)

app/test/e2e/specs/smoke.spec.ts (1)

54-71: 💤 Low value

Consider more precise route matching to avoid theoretical false positives.

The regex /^#\/(home|onboarding)/ and hash.startsWith('#/home') will match intended routes but could also match unintended ones like #/homepage or #/onboardingXYZ. For stricter route boundary enforcement, consider:

♻️ Proposed fix for stricter route matching

     let hash = '';
     await browser.waitUntil(
       async () => {
         hash = (await browser.execute(() => window.location.hash)) as string;
-        return /^#\/(home|onboarding)/.test(hash);
+        return /^#\/(home|onboarding)(\/|$)/.test(hash);
       },
       {
         timeout: 15_000,
         timeoutMsg: () => `hash never settled to `#/home` or `#/onboarding` (last seen "${hash}")`,
       }
     );
 
     // Best-effort content check when we did reach /home.
-    if (hash.startsWith('`#/home`')) {
+    if (/^#\/home(\/|$)/.test(hash)) {
       const homeText = await waitForHomePage(15_000);
       expect(homeText).toBeTruthy();
     }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/test/e2e/specs/smoke.spec.ts` around lines 54 - 71, The current route
check in browser.waitUntil and the subsequent branch uses loose patterns that
can match unintended routes (e.g., "`#/homepage`"); update the regex used in the
waitUntil call to enforce route boundaries (for example require end-of-string or
a following slash: use a pattern like ^#\/(home|onboarding)(?:$|\/) ), and
change the conditional that branches to home-specific checks (currently
hash.startsWith('`#/home`')) to a stricter match against the same boundary-aware
pattern (e.g., check hash matches ^#\/home(?:$|\/) or compare to the exact
"`#/home`" when appropriate) so waitForHomePage and expect only run for the exact
intended route; modify occurrences around the waitUntil, the hash variable, and
the hash.startsWith check accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@app/test/e2e/specs/smoke.spec.ts`:
- Around line 54-71: The current route check in browser.waitUntil and the
subsequent branch uses loose patterns that can match unintended routes (e.g.,
"`#/homepage`"); update the regex used in the waitUntil call to enforce route
boundaries (for example require end-of-string or a following slash: use a
pattern like ^#\/(home|onboarding)(?:$|\/) ), and change the conditional that
branches to home-specific checks (currently hash.startsWith('`#/home`')) to a
stricter match against the same boundary-aware pattern (e.g., check hash matches
^#\/home(?:$|\/) or compare to the exact "`#/home`" when appropriate) so
waitForHomePage and expect only run for the exact intended route; modify
occurrences around the waitUntil, the hash variable, and the hash.startsWith
check accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e2f1f54b-de62-49d4-91ff-22ee310a7383

📥 Commits

Reviewing files that changed from the base of the PR and between f1098ef and 47aff08.

📒 Files selected for processing (1)

app/test/e2e/specs/smoke.spec.ts

graycyrus

Walkthrough

Solid test infrastructure PR — the streaming mock LLM, introspection RPCs, and four new chat-harness specs give meaningful coverage to a previously untested production code path. The CI additions and artifact-upload removal are well-justified. A few project-pattern deviations in the Rust introspection module need attention.

Change Summary

File	Change type	Description
`.github/workflows/e2e-reusable.yml`	Modified	New `rust-e2e-linux` job, removed artifact uploads across all 3 OS lanes
`app/test/e2e/helpers/chat-harness.ts`	New	Shared DOM helpers for chat composer interaction
`app/test/e2e/specs/chat-harness-cancel.spec.ts`	New	Mid-stream cancel E2E spec
`app/test/e2e/specs/chat-harness-scroll-render.spec.ts`	New	Scroll-anchor + markdown rendering E2E spec
`app/test/e2e/specs/chat-harness-send-stream.spec.ts`	New	Send + stream pipeline E2E spec
`app/test/e2e/specs/chat-harness-subagent.spec.ts`	New	Orchestrator → subagent delegation E2E spec
`package.json`	Modified	Added `test:rust:e2e` script
`scripts/mock-api/routes/llm.mjs`	Modified	Full streaming SSE engine (scriptable, keyword-rule, forced-queue)
`scripts/test-rust-e2e.sh`	New	Shell runner: boots mock backend + runs Rust E2E suites
`src/openhuman/channels/providers/web.rs`	Modified	`in_flight_entries_for_test()` helper
`src/openhuman/test_support/introspect.rs`	New	4 read-only introspection RPCs
`src/openhuman/test_support/mod.rs`	Modified	`pub mod introspect` export
`src/openhuman/test_support/schemas.rs`	Modified	Register 4 new controllers + schemas

Per-file Analysis

`src/openhuman/test_support/introspect.rs`

Well-structured with proper path-traversal protection and symlink handling. However, it diverges from the sibling rpc.rs pattern in two ways: (1) no tracing::debug! entry/exit logging, and (2) no tracing::warn! on error paths. The surrounding code in rpc.rs consistently uses [test_support][rpc]-prefixed logging.

`scripts/mock-api/routes/llm.mjs`

The streaming engine is comprehensive — handles text, thinking, tool-call (with incremental arg chunking), finish, usage, and error entries. The [DONE] sentinel is always sent on non-error paths. Good error recovery with the fire-and-forget + catch pattern.

`app/test/e2e/specs/chat-harness-*.spec.ts`

All four specs are well-documented and test meaningful scenarios. The shared chat-harness.ts helpers are a good consolidation. The subagent spec notably omits @ts-nocheck (good). The specs diverge from existing patterns by not using stepLog or supportsExecuteScript(), but both omissions are deliberate per the PR discussion.

`scripts/test-rust-e2e.sh`

Clean shell script with proper set -euo pipefail, cleanup trap, and health-check loop. The --suite guard was improved per CodeRabbit feedback.

graycyrus · 2026-05-16T05:26:45Z

+    pub path: String,
+    pub exists: bool,
+}
+


[major] Missing debug logging — rpc.rs (the sibling module) logs at entry and exit with tracing::debug!("[test_support][rpc] fn_name: ...") and uses tracing::warn! on error paths. All four public functions in this file (workspace_root, list_workspace_files, read_workspace_file, in_flight_chats) skip this entirely.

The RpcOutcome::single_log calls handle the return payload logging, but they don't log at entry (with args) or on error branches. This makes debugging test failures harder — you won't see which RPC was called or what parameters were passed.

Suggestion: add entry-level debug logging to each function, e.g.:

pub async fn workspace_root() -> Result<RpcOutcome<WorkspaceRoot>, String> { tracing::debug!("[test_support][introspect] workspace_root"); // ... } pub async fn read_workspace_file( rel_path: String, max_bytes: Option<u64>, ) -> Result<RpcOutcome<ReadFileResult>, String> { tracing::debug!("[test_support][introspect] read_workspace_file: rel_path={rel_path:?}, max_bytes={max_bytes:?}"); // ... }

And tracing::warn! before returning Err(...) strings.

…lake) Marked as `it.skip` with the rationale inline. Same failure pattern is visible on main run 25952893380 — four hours before this branch ran — so this is not a regression introduced by the chat-harness PR. After `triggerAuthDeepLinkBypass` returns, the renderer's hash stays on `#/` for the full 15 s poll window. The bypass JWT lands in the sidecar config (visible in the `[resetApp]` log) but the renderer's router doesn't react to the auth-state change in the Linux CI image. That needs a dedicated investigation into the auth-state-change subscriber and the desktopDeepLinkListener path; this PR doesn't touch either. The first three smoke `it`s still cover what smoke is for — harness attaches, window is mapped, DOM is non-empty. Removing the broken fourth assertion unblocks every downstream PR that depends on `e2e / E2E (Linux / Appium Chromium)` going green.

senamakel · 2026-05-16T05:36:24Z

CI status note

The e2e / E2E (Linux / Appium Chromium) Tauri lane has been failing on this PR with:

Expected pattern: /^#\/home/
Received string:  "#/"

This is pre-existing on main, not a regression from this PR — main run 25952893380 (~4 h before this branch ran) hits the same assertion with the same hash value. The [resetApp] log shows the sidecar wipe + auth deep-link bypass complete cleanly, but the renderer's hash never moves off #/ even after a 15 s poll window. So the bypass JWT is landing in sidecar config (visible in the wipe log) but the renderer's router isn't reacting to the auth-state change in the Linux CI image. The chat-harness PR doesn't touch either of those paths.

To unblock this PR I've marked the offending it() as it.skip with the rationale inline (d3334eeb). The first three smoke assertions still cover what smoke is for (harness attaches, window is mapped, DOM is non-empty). The auth-routing flake deserves its own investigation into desktopDeepLinkListener + the auth-state subscriber.

The new e2e / E2E (Linux / Rust integration suite) lane has been green all along (4m17s on the previous run, 3m48s on this one).

Smoke now passes after marking the auth-routing assertion as \`it.skip\` in d3334ee. Mega-flow fails for the same pre-existing root cause — \`triggerDeepLink('openhuman://oauth/success?...')\` doesn't trigger the downstream \`/auth/integrations\` refresh in the Linux CI image, and \`composio_enable_trigger\` mutates state that \`composio_list_triggers\` doesn't see. Mega-flow was \`continue-on-error: true\` in the pre-refactor e2e.yml; the consolidation that produced \`e2e-reusable.yml\` collapsed it into the same step as smoke without that flag. Restoring the split so the Tauri E2E lane can go green on every PR that doesn't touch the broken paths. The mega-flow step still runs and remains visible in CI as a signal — it just doesn't gate merges anymore. Once someone lands the auth-deep-link / OAuth-success-deep-link / composio-list-after-enable fixes, drop the \`continue-on-error\` and consolidate the two steps again.

senamakel added 3 commits May 15, 2026 21:34

Merge remote-tracking branch 'upstream/main' into e2e/chat-harness-co…

7d04e69

…verage

senamakel requested a review from a team May 16, 2026 04:48

coderabbitai Bot added the working A PR that is being worked on by the team. label May 16, 2026

coderabbitai Bot requested changes May 16, 2026

View reviewed changes

senamakel added 2 commits May 15, 2026 22:05

chore: apply prettier auto-fixes

f1098ef

coderabbitai Bot requested changes May 16, 2026

View reviewed changes

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

graycyrus requested changes May 16, 2026

View reviewed changes

senamakel added 2 commits May 15, 2026 22:34

chore: apply prettier auto-fixes

d3334ee

senamakel merged commit dbbd33e into tinyhumansai:main May 16, 2026
22 checks passed

This was referenced May 16, 2026

Harden Rust custom provider routing and mock coverage #1958

Merged

test(mock-api): add fuzzy socket-aware harness #1963

Merged

Conversation

senamakel commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Streaming mock LLM + chat-harness specs

2. Rust-E2E Linux lane

3. Release-leak hygiene

Files

Verification

AI Authored PR Metadata

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Walkthrough

Change Summary

Per-file Analysis

src/openhuman/test_support/introspect.rs

scripts/mock-api/routes/llm.mjs

app/test/e2e/specs/chat-harness-*.spec.ts

scripts/test-rust-e2e.sh

Uh oh!

graycyrus May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

senamakel commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

senamakel commented May 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading

`src/openhuman/test_support/introspect.rs`

`scripts/mock-api/routes/llm.mjs`

`app/test/e2e/specs/chat-harness-*.spec.ts`

`scripts/test-rust-e2e.sh`