test: separate e2e tests and add backend adapter e2e coverage by teng-lin · Pull Request #16 · teng-lin/beamcode

teng-lin · 2026-02-16T18:24:11Z

Summary

Separate e2e from unit tests: npm test now excludes *.e2e.test.ts files, keeping the fast feedback loop (~7s). New test:e2e and test:all scripts added.
Add 24 new e2e tests for the three untested backend adapters (ACP, Agent SDK, Codex), bringing total e2e coverage from 35 tests to 59.
Shared test infrastructure in backend-test-utils.ts with MessageReader, mock helpers, and message factories extracted from compliance tests.

New e2e test coverage

Adapter	Tests	Scenarios
ACP	8	Full conversation, multi-turn, permission allow/deny, resume session, subprocess crash (mid-conversation + handshake), send after close
Agent SDK	8	Rich content (tool_use + tool_result), result with metadata, permission allow/deny via canUseTool, abort/interrupt, queryFn error recovery, send after close, missing queryFn
Codex	8	Streaming deltas → item done → completed, multi-turn, approval approve/deny, turn cancel, WebSocket close mid-turn, WebSocket error, send after close

Test plan

npm test excludes all *.e2e.test.ts — 79 files, 1495 tests
npm run test:e2e includes all e2e — 8 files, 59 tests
npm run test:all runs both — all green
Lint hooks pass

Split e2e tests into a dedicated vitest config so `npm test` runs only unit tests (~7s) while `npm run test:e2e` exercises integration flows. Added `test:all` script for full verification. New e2e tests for the three untested backend adapters: - ACP: 8 tests (conversation flow, multi-turn, permissions, resume, crash) - Agent SDK: 8 tests (rich content, results, permissions, abort, errors) - Codex: 8 tests (streaming, multi-turn, approvals, cancel, WS errors) Shared test infrastructure in backend-test-utils.ts with MessageReader, mock helpers, and message factories extracted from compliance tests.

gemini-code-assist · 2026-02-16T18:24:27Z

Summary of Changes

Hello @teng-lin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the testing suite by restructuring how unit and end-to-end tests are executed and by expanding e2e coverage for critical backend adapters. The changes ensure that developers can run fast unit tests without e2e overhead, while also providing robust validation for the integration and behavior of the ACP, Agent SDK, and Codex adapters through comprehensive new e2e scenarios. This improves test maintainability and reliability across the project.

Highlights

Test Separation: End-to-end (e2e) tests have been separated from unit tests, with npm test now excluding *.e2e.test.ts files to maintain a fast feedback loop. New test:e2e and test:all scripts were introduced for running e2e tests and all tests, respectively.
E2E Coverage Expansion: Twenty-four new e2e tests were added, significantly increasing coverage for the previously untested backend adapters: ACP (8 tests), Agent SDK (8 tests), and Codex (8 tests). This brings the total e2e test count from 35 to 59.
Shared Test Infrastructure: A new shared test infrastructure file, backend-test-utils.ts, was created. It centralizes MessageReader, mock helpers, and message factories, which were extracted from compliance tests to be reused across all backend adapter e2e tests.

Changelog

package.json
- Added new npm scripts: test:e2e for running end-to-end tests and test:all for running both unit and end-to-end tests.
src/e2e/acp-adapter.e2e.test.ts
- Added comprehensive end-to-end tests for the ACP adapter, covering full conversation flows, multi-turn interactions, permission requests (allow/deny), session resumption, and graceful handling of subprocess crashes.
src/e2e/agent-sdk-adapter.e2e.test.ts
- Added extensive end-to-end tests for the Agent SDK adapter, including scenarios for rich content (tool_use, tool_result), result metadata, permission handling via canUseTool, abort/interrupt functionality, query function error recovery, and handling of missing query functions.
src/e2e/codex-adapter.e2e.test.ts
- Added detailed end-to-end tests for the Codex adapter, verifying streaming responses (deltas, item done, completed), multi-turn conversations, approval flows (approve/deny), turn cancellation, and robust behavior during WebSocket closures or errors.
src/e2e/helpers/backend-test-utils.ts
- Added a new utility file to centralize shared e2e test infrastructure, including MessageReader for consistent stream iteration, mock implementations for child processes and WebSockets, and helper functions for creating scripted query functions and unified messages.
vitest.config.ts
- Modified the default Vitest configuration to exclude *.e2e.test.ts files from standard unit test runs.
vitest.e2e.config.ts
- Added a new Vitest configuration file specifically for end-to-end tests, setting longer testTimeout and hookTimeout values appropriate for e2e scenarios.

Activity

The pull request author, teng-lin, has separated e2e tests from unit tests and added new e2e coverage for ACP, Agent SDK, and Codex adapters.
The author has also introduced shared test infrastructure to streamline future e2e test development.
The author has verified the test plan, confirming that npm test excludes e2e files, npm run test:e2e includes all e2e tests, npm run test:all runs both successfully, and lint hooks pass.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request successfully separates e2e tests from unit tests and adds comprehensive e2e coverage for the ACP, Agent SDK, and Codex backend adapters. The introduction of shared test infrastructure in backend-test-utils.ts is a positive step towards maintainability and consistency across e2e tests. The new test:e2e and test:all scripts provide clear execution paths for different testing needs.

gemini-code-assist · 2026-02-16T18:26:49Z

+      const { child, stdin, stdout } = createMockChild();
+      setupStdin(stdin, stdout);
+      return child;
+    }) as unknown as SpawnFn;


The type assertion as unknown as SpawnFn bypasses TypeScript's type safety. It would be more robust to ensure the mock function's signature aligns directly with SpawnFn to avoid potential runtime issues if the SpawnFn type changes. Consider refining the mock type to match SpawnFn more closely.

gemini-code-assist · 2026-02-16T18:26:49Z

+
+    session.send(createUserMessage("trigger crash"));
+
+    const messages = await reader.collect(2, 2000);


The collect method uses a hardcoded timeout of 2000ms. Hardcoded timeouts can lead to flaky tests, especially in CI environments or under varying load. Consider making this timeout configurable or using a more dynamic waiting mechanism if possible to improve test reliability.

gemini-code-assist · 2026-02-16T18:26:49Z

+
+  it("subprocess crash during handshake rejects connect()", async () => {
+    const adapter = createAdapter((_stdin, stdout) => {
+      setTimeout(() => stdout.emit("close"), 5);


Relying on a hardcoded setTimeout with a small, arbitrary delay like 5ms can introduce flakiness into tests. It's generally more reliable to wait for a specific event or condition to occur rather than assuming a fixed time will be sufficient.

gemini-code-assist · 2026-02-16T18:26:49Z

+            }
+            abortSignal?.addEventListener("abort", () => resolve());
+            // Also resolve after a timeout to not hang the test
+            setTimeout(resolve, 2000);


The setTimeout(resolve, 2000) in the abort/interrupt test is a hardcoded timeout. This can make tests brittle. It's preferable to wait for the abortSignal to actually trigger or for a specific state change rather than a fixed time to ensure test reliability.

gemini-code-assist · 2026-02-16T18:26:50Z

+
+    // First call throws — give a moment for error to propagate
+    session.send(createUserMessage("Trigger error"));
+    await new Promise((r) => setTimeout(r, 50));


Introducing an arbitrary setTimeout delay like 50ms can lead to flaky tests, especially if the asynchronous operations take longer than expected. It's generally better to wait for a specific condition or event to ensure the test's reliability.

gemini-code-assist · 2026-02-16T18:26:50Z

+    ws.emit("close");
+
+    // Should get the partial message, then stream should end
+    const messages = await collectUnifiedMessages(session, 2, 1000);


The collectUnifiedMessages call uses a hardcoded timeout of 1000ms. Hardcoded timeouts can lead to flaky tests. Consider making this configurable or using a more event-driven waiting mechanism to improve test reliability.

gemini-code-assist · 2026-02-16T18:26:50Z

+
+    // Stream should end
+    const messages = await collectUnifiedMessages(session, 1, 500);
+    // May or may not collect any messages, but stream shouldn't hang


The collectUnifiedMessages call uses a hardcoded timeout of 500ms. This is another instance of a hardcoded timeout that could lead to test flakiness. Consider making this configurable or using a more event-driven waiting mechanism.

gemini-code-assist · 2026-02-16T18:26:50Z

+            timeoutMs,
+          ),
+        ),


The use of undefined as unknown as UnifiedMessage in the setTimeout callback for Promise.race is a common workaround for TypeScript's type system when a promise resolves with an undefined value on timeout. While functional, it could be slightly cleaner if the IteratorResult type could explicitly handle undefined for value when done is true, or if the timeout promise resolved with a distinct sentinel value that is then handled.

gemini-code-assist · 2026-02-16T18:26:50Z

+            () => r({ value: undefined as unknown as UnifiedMessage, done: true }),
+            timeoutMs,
+          ),


The use of undefined as unknown as UnifiedMessage in the setTimeout callback for Promise.race is a common workaround for TypeScript's type system when a promise resolves with an undefined value on timeout. While functional, it could be slightly cleaner if the IteratorResult type could explicitly handle undefined for value when done is true, or if the timeout promise resolved with a distinct sentinel value that is then handled.

gemini-code-assist · 2026-02-16T18:26:50Z

+    type: "user_message",
+    role: "user",
+    content: [{ type: "text", text }],
+    metadata: { sessionId, session_id: sessionId },


The createUserMessage function includes both sessionId and session_id in the metadata. If sessionId is the canonical key, session_id could be redundant or potentially lead to confusion. If both are necessary for distinct purposes (e.g., one for internal use, one for external API), a clarifying comment would be beneficial.

Update docs/architecture.md: - Replace Tier 1 table with resolved summary (#1–4 all fixed) - Update Tier 2 #6 (pendingInitialize now set in correct post-reducer hook) - Update Tier 3 #14 (handleInboundCommand rejections fixed via SEND_TO_CONSUMER) - Update Tier 3 #16 (trySendRawToBackend now runtime-internal only) Also narrow SLASH_LOCAL_RESULT signal source type to "emulated" | "cli" to match ConsumerMessage contract (caught by typecheck).

teng-lin merged commit 4946f66 into main Feb 16, 2026
4 checks passed

teng-lin deleted the feat/e2e-testing branch February 16, 2026 18:26

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: separate e2e tests and add backend adapter e2e coverage#16

test: separate e2e tests and add backend adapter e2e coverage#16
teng-lin merged 1 commit intomainfrom
feat/e2e-testing

teng-lin commented Feb 16, 2026

Uh oh!

gemini-code-assist bot commented Feb 16, 2026

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		session.send(createUserMessage("trigger crash"));

		const messages = await reader.collect(2, 2000);

Conversation

teng-lin commented Feb 16, 2026

Summary

New e2e test coverage

Test plan

Uh oh!

gemini-code-assist bot commented Feb 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant