Skip to content

test: separate e2e tests and add backend adapter e2e coverage#16

Merged
teng-lin merged 1 commit intomainfrom
feat/e2e-testing
Feb 16, 2026
Merged

test: separate e2e tests and add backend adapter e2e coverage#16
teng-lin merged 1 commit intomainfrom
feat/e2e-testing

Conversation

@teng-lin
Copy link
Copy Markdown
Owner

Summary

  • Separate e2e from unit tests: npm test now excludes *.e2e.test.ts files, keeping the fast feedback loop (~7s). New test:e2e and test:all scripts added.
  • Add 24 new e2e tests for the three untested backend adapters (ACP, Agent SDK, Codex), bringing total e2e coverage from 35 tests to 59.
  • Shared test infrastructure in backend-test-utils.ts with MessageReader, mock helpers, and message factories extracted from compliance tests.

New e2e test coverage

Adapter Tests Scenarios
ACP 8 Full conversation, multi-turn, permission allow/deny, resume session, subprocess crash (mid-conversation + handshake), send after close
Agent SDK 8 Rich content (tool_use + tool_result), result with metadata, permission allow/deny via canUseTool, abort/interrupt, queryFn error recovery, send after close, missing queryFn
Codex 8 Streaming deltas → item done → completed, multi-turn, approval approve/deny, turn cancel, WebSocket close mid-turn, WebSocket error, send after close

Test plan

  • npm test excludes all *.e2e.test.ts — 79 files, 1495 tests
  • npm run test:e2e includes all e2e — 8 files, 59 tests
  • npm run test:all runs both — all green
  • Lint hooks pass

Split e2e tests into a dedicated vitest config so `npm test` runs only
unit tests (~7s) while `npm run test:e2e` exercises integration flows.
Added `test:all` script for full verification.

New e2e tests for the three untested backend adapters:
- ACP: 8 tests (conversation flow, multi-turn, permissions, resume, crash)
- Agent SDK: 8 tests (rich content, results, permissions, abort, errors)
- Codex: 8 tests (streaming, multi-turn, approvals, cancel, WS errors)

Shared test infrastructure in backend-test-utils.ts with MessageReader,
mock helpers, and message factories extracted from compliance tests.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @teng-lin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the testing suite by restructuring how unit and end-to-end tests are executed and by expanding e2e coverage for critical backend adapters. The changes ensure that developers can run fast unit tests without e2e overhead, while also providing robust validation for the integration and behavior of the ACP, Agent SDK, and Codex adapters through comprehensive new e2e scenarios. This improves test maintainability and reliability across the project.

Highlights

  • Test Separation: End-to-end (e2e) tests have been separated from unit tests, with npm test now excluding *.e2e.test.ts files to maintain a fast feedback loop. New test:e2e and test:all scripts were introduced for running e2e tests and all tests, respectively.
  • E2E Coverage Expansion: Twenty-four new e2e tests were added, significantly increasing coverage for the previously untested backend adapters: ACP (8 tests), Agent SDK (8 tests), and Codex (8 tests). This brings the total e2e test count from 35 to 59.
  • Shared Test Infrastructure: A new shared test infrastructure file, backend-test-utils.ts, was created. It centralizes MessageReader, mock helpers, and message factories, which were extracted from compliance tests to be reused across all backend adapter e2e tests.
Changelog
  • package.json
    • Added new npm scripts: test:e2e for running end-to-end tests and test:all for running both unit and end-to-end tests.
  • src/e2e/acp-adapter.e2e.test.ts
    • Added comprehensive end-to-end tests for the ACP adapter, covering full conversation flows, multi-turn interactions, permission requests (allow/deny), session resumption, and graceful handling of subprocess crashes.
  • src/e2e/agent-sdk-adapter.e2e.test.ts
    • Added extensive end-to-end tests for the Agent SDK adapter, including scenarios for rich content (tool_use, tool_result), result metadata, permission handling via canUseTool, abort/interrupt functionality, query function error recovery, and handling of missing query functions.
  • src/e2e/codex-adapter.e2e.test.ts
    • Added detailed end-to-end tests for the Codex adapter, verifying streaming responses (deltas, item done, completed), multi-turn conversations, approval flows (approve/deny), turn cancellation, and robust behavior during WebSocket closures or errors.
  • src/e2e/helpers/backend-test-utils.ts
    • Added a new utility file to centralize shared e2e test infrastructure, including MessageReader for consistent stream iteration, mock implementations for child processes and WebSockets, and helper functions for creating scripted query functions and unified messages.
  • vitest.config.ts
    • Modified the default Vitest configuration to exclude *.e2e.test.ts files from standard unit test runs.
  • vitest.e2e.config.ts
    • Added a new Vitest configuration file specifically for end-to-end tests, setting longer testTimeout and hookTimeout values appropriate for e2e scenarios.
Activity
  • The pull request author, teng-lin, has separated e2e tests from unit tests and added new e2e coverage for ACP, Agent SDK, and Codex adapters.
  • The author has also introduced shared test infrastructure to streamline future e2e test development.
  • The author has verified the test plan, confirming that npm test excludes e2e files, npm run test:e2e includes all e2e tests, npm run test:all runs both successfully, and lint hooks pass.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@teng-lin teng-lin merged commit 4946f66 into main Feb 16, 2026
4 checks passed
@teng-lin teng-lin deleted the feat/e2e-testing branch February 16, 2026 18:26
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully separates e2e tests from unit tests and adds comprehensive e2e coverage for the ACP, Agent SDK, and Codex backend adapters. The introduction of shared test infrastructure in backend-test-utils.ts is a positive step towards maintainability and consistency across e2e tests. The new test:e2e and test:all scripts provide clear execution paths for different testing needs.

const { child, stdin, stdout } = createMockChild();
setupStdin(stdin, stdout);
return child;
}) as unknown as SpawnFn;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type assertion as unknown as SpawnFn bypasses TypeScript's type safety. It would be more robust to ensure the mock function's signature aligns directly with SpawnFn to avoid potential runtime issues if the SpawnFn type changes. Consider refining the mock type to match SpawnFn more closely.


session.send(createUserMessage("trigger crash"));

const messages = await reader.collect(2, 2000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The collect method uses a hardcoded timeout of 2000ms. Hardcoded timeouts can lead to flaky tests, especially in CI environments or under varying load. Consider making this timeout configurable or using a more dynamic waiting mechanism if possible to improve test reliability.


it("subprocess crash during handshake rejects connect()", async () => {
const adapter = createAdapter((_stdin, stdout) => {
setTimeout(() => stdout.emit("close"), 5);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Relying on a hardcoded setTimeout with a small, arbitrary delay like 5ms can introduce flakiness into tests. It's generally more reliable to wait for a specific event or condition to occur rather than assuming a fixed time will be sufficient.

}
abortSignal?.addEventListener("abort", () => resolve());
// Also resolve after a timeout to not hang the test
setTimeout(resolve, 2000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The setTimeout(resolve, 2000) in the abort/interrupt test is a hardcoded timeout. This can make tests brittle. It's preferable to wait for the abortSignal to actually trigger or for a specific state change rather than a fixed time to ensure test reliability.


// First call throws — give a moment for error to propagate
session.send(createUserMessage("Trigger error"));
await new Promise((r) => setTimeout(r, 50));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Introducing an arbitrary setTimeout delay like 50ms can lead to flaky tests, especially if the asynchronous operations take longer than expected. It's generally better to wait for a specific condition or event to ensure the test's reliability.

ws.emit("close");

// Should get the partial message, then stream should end
const messages = await collectUnifiedMessages(session, 2, 1000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The collectUnifiedMessages call uses a hardcoded timeout of 1000ms. Hardcoded timeouts can lead to flaky tests. Consider making this configurable or using a more event-driven waiting mechanism to improve test reliability.


// Stream should end
const messages = await collectUnifiedMessages(session, 1, 500);
// May or may not collect any messages, but stream shouldn't hang
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The collectUnifiedMessages call uses a hardcoded timeout of 500ms. This is another instance of a hardcoded timeout that could lead to test flakiness. Consider making this configurable or using a more event-driven waiting mechanism.

Comment on lines +47 to +49
timeoutMs,
),
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of undefined as unknown as UnifiedMessage in the setTimeout callback for Promise.race is a common workaround for TypeScript's type system when a promise resolves with an undefined value on timeout. While functional, it could be slightly cleaner if the IteratorResult type could explicitly handle undefined for value when done is true, or if the timeout promise resolved with a distinct sentinel value that is then handled.

Comment on lines +69 to +71
() => r({ value: undefined as unknown as UnifiedMessage, done: true }),
timeoutMs,
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of undefined as unknown as UnifiedMessage in the setTimeout callback for Promise.race is a common workaround for TypeScript's type system when a promise resolves with an undefined value on timeout. While functional, it could be slightly cleaner if the IteratorResult type could explicitly handle undefined for value when done is true, or if the timeout promise resolved with a distinct sentinel value that is then handled.

type: "user_message",
role: "user",
content: [{ type: "text", text }],
metadata: { sessionId, session_id: sessionId },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The createUserMessage function includes both sessionId and session_id in the metadata. If sessionId is the canonical key, session_id could be redundant or potentially lead to confusion. If both are necessary for distinct purposes (e.g., one for internal use, one for external API), a clarifying comment would be beneficial.

teng-lin added a commit that referenced this pull request Feb 25, 2026
Update docs/architecture.md:
- Replace Tier 1 table with resolved summary (#1–4 all fixed)
- Update Tier 2 #6 (pendingInitialize now set in correct post-reducer hook)
- Update Tier 3 #14 (handleInboundCommand rejections fixed via SEND_TO_CONSUMER)
- Update Tier 3 #16 (trySendRawToBackend now runtime-internal only)

Also narrow SLASH_LOCAL_RESULT signal source type to "emulated" | "cli"
to match ConsumerMessage contract (caught by typecheck).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant