Skip to content

fix(ci): Surface screenshot test failures and add retries#119

Merged
philprime merged 1 commit intomainfrom
fix/screenshot-test-reliability
Apr 22, 2026
Merged

fix(ci): Surface screenshot test failures and add retries#119
philprime merged 1 commit intomainfrom
fix/screenshot-test-reliability

Conversation

@philprime
Copy link
Copy Markdown
Member

The screenshot CI job has been green on main since #116 despite the underlying ScreenshotUITests.testScreenshots failing on most runs — the failures silently disappeared, and the empty/partial artifacts never raised an alert.

Root cause in fastlane/snapshot/lib/snapshot/simulator_launchers/simulator_launcher.rb:131:

UI.crash!("Too many errors... no more retries...") if launcher_config.stop_after_first_error

Snapshot only raises on exhausted retries when stop_after_first_error is true. Without it, the error_proc falls off the end and snapshot returns as if nothing happened. Our lanes had it at the default (false).

Three changes:

  • stop_after_first_error: true on both capture_screenshots calls. This is the key fix — makes failures actually fail the lane, which fails the job, which blocks the PR.
  • number_of_retries: 3 on both calls. Absorbs real simulator flake (long-press + context-menu timing) without hiding real bugs.
  • Extend waitForExistence timeouts to 10s in ScreenshotUITests.testScreenshots. The previous 1–5s timeouts are fine on a dev Mac but too tight on a contended hosted runner. Lines 63/66 and 150/153 (press(forDuration:) → editButton.waitForExistence(timeout: 3)) have been the main flake source.

Flag name stop_after_first_error is misleading — despite the name, it does not stop before retries. It only affects what happens after retries are exhausted: crash (true) vs. return silently (false). Combined with number_of_retries: 3, the real behaviour is "retry 3 times, then fail loudly", which is what we want.

Depends on #118 (EDR simulator fallback) being in main — without it, retries would still fail on every attempt due to the QR accessibility issue.

Three related fixes so the screenshot CI job actually fails PRs when
ScreenshotUITests fails:

- Extend waitForExistence timeouts in ScreenshotUITests to 10s across
  the board. The 1s/2s/3s timeouts were fine locally but too tight under
  CI runner load, particularly around long-press + context-menu
  interactions (lines 63/66 and 150/153), which have been flaky.
- Set number_of_retries: 3 on both capture_screenshots calls
  (generate_screenshots and generate_screenshots_ci). Absorbs
  genuine simulator flakes without hiding real bugs.
- Set stop_after_first_error: true on both capture_screenshots calls.
  Without this, snapshot's simulator_launcher swallows exhausted-retry
  failures (see snapshot/lib/snapshot/simulator_launchers/
  simulator_launcher.rb:131) and returns as if everything succeeded —
  which is why every screenshot run on main since #116 reported ✅
  while actually producing incomplete screenshot sets.
@sentry
Copy link
Copy Markdown

sentry Bot commented Apr 22, 2026

📲 Install Builds

iOS

🔗 App Name App ID Version Configuration
Flinky com.techprimate.Flinky 1.1.3 (52) --

⚙️ flinky Build Distribution Settings

@philprime philprime enabled auto-merge (squash) April 22, 2026 08:27
@philprime philprime disabled auto-merge April 22, 2026 08:27
@philprime philprime merged commit 983d7a9 into main Apr 22, 2026
6 of 7 checks passed
@philprime philprime deleted the fix/screenshot-test-reliability branch April 22, 2026 08:27
philprime added a commit that referenced this pull request Apr 22, 2026
The matrix screenshot jobs use scan (run_tests) rather than snapshot
(capture_screenshots), so the retries added in #119 don't apply here.
iPad matrix jobs are consistently flaking at the long-press /
context-menu interaction (ScreenshotUITests:66) and failing the whole
release because release-upload needs the matrix.

Add number_of_retries: 3 to the run_tests call so scan retries the
test before giving up, matching the snapshot lane's behaviour.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant