Skip to content

Add Buildkite pipeline for AI E2E tests (simulator-llm-pilot gem)#25444

Draft
iangmaia wants to merge 11 commits intotrunkfrom
iangmaia/ci-ai-e2e-tests-gem
Draft

Add Buildkite pipeline for AI E2E tests (simulator-llm-pilot gem)#25444
iangmaia wants to merge 11 commits intotrunkfrom
iangmaia/ci-ai-e2e-tests-gem

Conversation

@iangmaia
Copy link
Copy Markdown
Contributor

@iangmaia iangmaia commented Mar 24, 2026

Summary

  • Adds a Buildkite command script and pipeline step for running AI E2E tests using the simulator-llm-pilot gem
  • Checks for "Testing" label on PR (skips if missing to save CI resources)
  • Downloads build artifacts, installs app on simulator, installs the gem from GitHub, runs tests

The gem handles everything internally: simulator detection, WDA lifecycle, agent loop with sandboxed tools, context window compression, verification/cleanup enforcement, and structured results.

Alternative approach: see #25443 for a Claude Code + wrapper scripts version of the same pipeline.

Ref: AINFRA-2176

Test plan

  • Run .buildkite/commands/run-ai-e2e-tests.sh locally with a booted simulator and test site credentials
  • Run a simple test case (users-screen-loads.md) end-to-end
  • Verify results.md is written with correct pass/fail status

🤖 Generated with Claude Code

@dangermattic
Copy link
Copy Markdown
Collaborator

1 Message
📖 This PR is still a Draft: some checks will be skipped.

Generated by 🚫 Danger

@wpmobilebot
Copy link
Copy Markdown
Contributor

wpmobilebot commented Mar 24, 2026

App Icon📲 You can test the changes from this Pull Request in WordPress by scanning the QR code below to install the corresponding build.
App NameWordPress
ConfigurationRelease-Alpha
Build Number31834
VersionPR #25444
Bundle IDorg.wordpress.alpha
Commit1602fa9
Installation URL59e1k0sddk8ao
Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

@wpmobilebot
Copy link
Copy Markdown
Contributor

wpmobilebot commented Mar 24, 2026

App Icon📲 You can test the changes from this Pull Request in Jetpack by scanning the QR code below to install the corresponding build.
App NameJetpack
ConfigurationRelease-Alpha
Build Number31834
VersionPR #25444
Bundle IDcom.jetpack.alpha
Commit1602fa9
Installation URL3sirsgohefqko
Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

@iangmaia iangmaia self-assigned this Mar 24, 2026
@iangmaia iangmaia added the Testing Unit and UI Tests and Tooling label Mar 25, 2026
iangmaia and others added 10 commits March 26, 2026 23:17
The gem provides a sandboxed agent that drives the simulator through a
fixed set of tools (tap, swipe, type, REST API) with no arbitrary code
execution. It handles WDA lifecycle, session management, context window
compression, and verification/cleanup enforcement internally.

The Buildkite step:
- Checks for "Testing" label (skips if missing)
- Downloads build artifacts and installs app on simulator
- Installs the simulator-llm-pilot gem from GitHub
- Runs all test cases in Tests/AgentTests/ui-tests/

Ref: AINFRA-2176

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gem build resolves spec file paths relative to cwd, so
bin/simulator-llm-pilot wasn't found when building from the
wordpress-ios repo root.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract WDA build to a separate build-wda.sh script for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@iangmaia iangmaia force-pushed the iangmaia/ci-ai-e2e-tests-gem branch from 2c6e292 to eb6d030 Compare March 26, 2026 22:17
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Status] DO NOT MERGE Testing Unit and UI Tests and Tooling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants