Add CI script and hardened skill for AI-driven E2E tests by iangmaia · Pull Request #25443 · wordpress-mobile/WordPress-iOS

iangmaia · 2026-03-24T13:16:06Z

Summary

CI entry point (Scripts/ci/run-ai-e2e-tests.sh) that manages simulator, WDA lifecycle, and runs Claude Code with a locked-down --allowedTools allowlist
Wrapper scripts (wda-curl.sh, wp-api.sh, launch-app.sh) replace raw curl — validate methods, reject path traversal, read credentials from env vars
CI-specific skill (ci-test-runner) teaches Claude how to drive E2E tests using only the wrapper scripts

Existing local dev skills (ai-test-runner, ios-sim-navigation) are not modified.

Ref: AINFRA-2176

Test plan

Run ./Scripts/ci/run-ai-e2e-tests.sh locally with a booted simulator and test site credentials
Verify wrapper scripts reject bad input (wp-api.sh GET "../../etc/passwd" → error)
Run a simple test case (users-screen-loads.md) end-to-end
Verify results.md is written with correct pass/fail status

🤖 Generated with Claude Code

Introduce a locked-down Claude Code setup for running AI E2E tests in CI: - CI entry point (Scripts/ci/run-ai-e2e-tests.sh) that manages the full lifecycle: simulator, WDA, Claude Code with --allowedTools, results - Wrapper scripts (wda-curl.sh, wp-api.sh, launch-app.sh) that replace raw curl — validate methods, reject path traversal, read credentials from env vars so Claude never sees them in commands - CI-specific skill (ci-test-runner) with all WDA interaction patterns using wrapper scripts instead of raw curl Ref: AINFRA-2176 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dangermattic · 2026-03-24T13:16:32Z

	1 Warning
⚠️	This PR is larger than 500 lines of changes. Please consider splitting it into smaller PRs for easier and faster reviews.

	1 Message
📖	This PR is still a Draft: some checks will be skipped.

Generated by 🚫 Danger

wpmobilebot · 2026-03-24T13:28:04Z

📲 You can test the changes from this Pull Request in WordPress by scanning the QR code below to install the corresponding build.

	App Name	WordPress
	Configuration	Release-Alpha
	Build Number	`31845`
	Version	`PR #25443`
	Bundle ID	`org.wordpress.alpha`
	Commit	`23c8799`
	Installation URL	1p8cigq5n80og

Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

wpmobilebot · 2026-03-24T13:28:20Z

📲 You can test the changes from this Pull Request in Jetpack by scanning the QR code below to install the corresponding build.

	App Name	Jetpack
	Configuration	Release-Alpha
	Build Number	`31845`
	Version	`PR #25443`
	Bundle ID	`com.jetpack.alpha`
	Commit	`23c8799`
	Installation URL	1abo8k6s94618

Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

Merge the CI entry point into a single .buildkite/commands script that: - Checks for "Testing" label on PR (skips early if missing) - Downloads build artifacts and installs app on simulator - Runs Claude Code with locked-down --allowedTools Added as an inline step in pipeline.yml (depends on build_jetpack, soft_fail, 30min timeout). Remove the separate Scripts/ci entry point. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- BUILDKITE_PULL_REQUEST_LABELS is comma-separated, not semicolons - Fix missing spaces after [[ in conditional tests - Install Node.js via brew if npm is not available - Add explicit return to get_booted_udid function Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract WDA build to a separate build-wda.sh script for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-03-25T21:44:48Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mokagio · 2026-03-27T04:05:52Z

.claude/settings.json

+    "deny": [
+      "Read(./.env)",
+      "Read(./.env.*)",
+      "Read(./.git/**)",
+      "Read(./DerivedData/**)",
+      "Read(./build/**)",
+      "Read(./build-products-*.tar)",
+      "Read(./**/*.mobileprovision)",
+      "Read(./**/*.p12)",
+      "Read(./**/*secret*)"
+    ]
  }


This is something we should do in all repos. Nice!

What's the rationale behind blocking DerivedData? Performance or security?

mokagio · 2026-03-27T04:49:34Z

Interesting Claude-related failure:

I wonder how to deal with this? Can we change the tests so that they call less tools? Or, should we bump the tools threshold?

- New tap-element.sh combines find+click into a single call, cutting turns per tap from 2-3 to 1. Tries accessibility ID first, falls back to label. - Reduce CLAUDE_MAX_TURNS from 120 to 80 so failed tests bail out faster (gem completes most tests in 15-55 turns). - Extend Buildkite timeout from 60 to 90 minutes to ensure all 11 tests can complete. - Update ci-test-runner skill to promote tap-element.sh as the preferred tap method. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Increase CLAUDE_MAX_TURNS from 80 to 100 — 80 was too tight for complex tests like scheduled post that need date picker interaction. - Hard-cap screenshots at 3 per test in take-ai-test-screenshot.sh. After the limit, the script returns a message instead of capturing. - Strengthen the skill to make clear that screenshots are only for recording failures, never for UI inspection during normal flow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-03-27T18:16:18Z

Quality Gate passed

Issues
18 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

iangmaia self-assigned this Mar 24, 2026

iangmaia added the [Status] DO NOT MERGE label Mar 24, 2026

iangmaia force-pushed the iangmaia/ci-ai-e2e-tests branch from 1bf7265 to e99456e Compare March 24, 2026 13:40

iangmaia mentioned this pull request Mar 24, 2026

Add Buildkite pipeline for AI E2E tests (simulator-llm-pilot gem) #25444

Draft

3 tasks

Use [[ instead of [ for conditional tests

368db11

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

iangmaia added the Testing Unit and UI Tests and Tooling label Mar 25, 2026

iangmaia and others added 3 commits March 25, 2026 21:33

Trigger CI

a1b75f0

Clone and build WebDriverAgent if not present on CI agent

e057373

Extract WDA build to a separate build-wda.sh script for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

iangmaia and others added 10 commits March 26, 2026 10:54

Export SIMULATOR_NAME so build-wda.sh can read it

9a17489

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Harden Claude AI E2E runner

2661cbc

Use APP_BUNDLE_ID consistently

3df0ea8

Normalize CI site URLs and extend WDA startup timeout

78f6072

Use built WDA artifacts in CI and extend AI timeout

7b1e1c0

Fix Claude CLI invocation and bash 3 status output

8a055dd

Use Claude Sonnet 4.6 by default

76b08a6

Pass Claude prompt after option terminator

6ebf690

Stream Claude AI E2E progress

40188c2

Fix Rubocop errors

0a5eb81

mokagio reviewed Mar 27, 2026

View reviewed changes

iangmaia added 3 commits March 27, 2026 12:55

Attempt to simplify Claude E2E test running

4208621

Trigger CI

7e784e0

Back to Sonnet 4.6

89b2cdb

iangmaia and others added 2 commits March 27, 2026 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CI script and hardened skill for AI-driven E2E tests#25443

Add CI script and hardened skill for AI-driven E2E tests#25443
iangmaia wants to merge 21 commits intotrunkfrom
iangmaia/ci-ai-e2e-tests

iangmaia commented Mar 24, 2026 •

edited

Loading

Uh oh!

dangermattic commented Mar 24, 2026

Uh oh!

wpmobilebot commented Mar 24, 2026 •

edited

Loading

Uh oh!

wpmobilebot commented Mar 24, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Mar 25, 2026

Uh oh!

mokagio Mar 27, 2026

Uh oh!

mokagio commented Mar 27, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

iangmaia commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

dangermattic commented Mar 24, 2026

Uh oh!

wpmobilebot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wpmobilebot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 25, 2026

Quality Gate passed

Uh oh!

mokagio Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

mokagio commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 27, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iangmaia commented Mar 24, 2026 •

edited

Loading

wpmobilebot commented Mar 24, 2026 •

edited

Loading

wpmobilebot commented Mar 24, 2026 •

edited

Loading

mokagio commented Mar 27, 2026 •

edited

Loading