ci: stop refetching translations and rebuilding JS in downstream jobs#513
Conversation
XCFramework BuildThis PR's XCFramework is available for testing. Add the following to your .package(url: "https://github.com/wordpress-mobile/GutenbergKit", branch: "pr-build/513")Built from 8b403ce |
8adc99c to
825a86c
Compare
oguzkocer
left a comment
There was a problem hiding this comment.
Looks good. I raised an issue, but that's not a regression, a possibly missed opportunity instead. So, I am approving it as is.
| # Skip when `dist/` already exists *and* `src/translations/` is | ||
| # populated — translations are baked into the bundle at JS build time, |
There was a problem hiding this comment.
I'm not familiar with this pipeline, but if I am understanding it correctly, the translations are baked into the bundle we get from dist/ which means if it exists, we don't need translations at all. However, the check [ -n "$$(find src/translations -maxdepth 1 -name '*.json' -print -quit 2>/dev/null)" ] makes it so that it'll still get downloaded - and not used?
If that understanding is correct, it'd mean that we are downloading translations in downstream CI builds where we have dist/ but not the translations folder, and then simply discard the newly downloaded translations because there is no build step.
I think we can skip that check and save some time in CI steps such as Test Android Library.
The previous skip clause required both `dist/` to exist *and* `src/translations/` to be populated. The second condition was meant to guard against a stale `dist/` paired with an empty `src/translations/` silently producing a no-translations rebuild on `REFRESH_JS_BUILD=1`. In practice the defence cost more than it saved: every downstream CI job consuming `dist.tar.gz` (which never includes `src/translations/`) hit the fallback branch and re-fetched all ~50 locales, only to discard the result when `build`'s recipe short-circuited on existing `dist/`. The legitimate "I want fresh translations" workflow already has an explicit path (`REFRESH_L10N=1`) that bypasses the dist check, and `make prep-translations` direct-invocation still forces a fetch. Per code review: #513 (comment)
Cuts PR build wall-time from 13:15 → 9:00 (32% faster) and per-build agent-time by ~10 minutes. Single source of truth for translation fetching and JS compilation: only `Build React App` does either, and every other job consumes the `dist.tar.gz` artifact. Three compounding inefficiencies removed: 1. `build-react` re-fetched translations every build. `make build REFRESH_L10N=1` defeated the Makefile's existence cache, forcing all ~50 locales to be re-fetched from translate.wordpress.org on every build. 2. Every downstream `: build` consumer (`test-swift-library`, `test-swift-package`, `test-android`, `test-android-library-e2e`, `build-resources-xcframework`) re-fetched translations *and* re-ran `npm ci`, only to throw the work away when `build`'s recipe short-circuited on existing `dist/`. 3. `Test Android Library` and `Test Android Library Instrumented` had no upstream dependency, so they walked the full `: build` chain — fetch all locales, `npm ci`, `npm run build`, copy assets — before finally running gradle. Pipeline (.buildkite/pipeline.yml): - `Build React App` backgrounds `make prep-translations` while running `make npm-dependencies` in the foreground, then `wait`s on the fetch before `make build`. `set -euo pipefail`, plus log redirect/replay so the backgrounded fetch's section markers don't interleave with `npm ci`'s and confuse Buildkite log folding. - `Test Android Library` and `Test Android Library Instrumented` now `depends_on: build-react` and download `dist.tar.gz`. Makefile: - `prep-translations` skips when `dist/` exists *and* `src/translations/` is populated. `REFRESH_L10N=1` and direct invocation still force the fetch. - `build` drops `npm-dependencies` from its prereq list and invokes it via `$(MAKE) _RECURSIVE_INVOKE=1 npm-dependencies` inside the rebuild branch. Downstream `: build` consumers no longer pay for an `npm ci` they don't need. - `npm-dependencies` uses the `_RECURSIVE_INVOKE=1` sentinel to distinguish recursive calls from user-direct invocations, so `make npm-dependencies` still installs even when node_modules exists (matches existing user expectation), while the recursive call from `build` short-circuits. - `test-android` and `test-android-library-e2e` copy `dist/` into Android assets unconditionally — `build`'s recipe short-circuits `copy-dist-android` when `dist/` exists, so CI agents that download a dist artifact would otherwise have stale assets. bin/prep-translations.js: - Drop `import fetch from 'node-fetch'` in favour of Node 20's built-in global `fetch`. Lets the script run before `npm ci` populates `node_modules`, enabling the parallel-with-npm-ci pattern in the CI step. Also drops `node-fetch` from package.json / package-lock.json (98 lines of lockfile).
The previous skip clause required both `dist/` to exist *and* `src/translations/` to be populated. The second condition was meant to guard against a stale `dist/` paired with an empty `src/translations/` silently producing a no-translations rebuild on `REFRESH_JS_BUILD=1`. In practice the defence cost more than it saved: every downstream CI job consuming `dist.tar.gz` (which never includes `src/translations/`) hit the fallback branch and re-fetched all ~50 locales, only to discard the result when `build`'s recipe short-circuited on existing `dist/`. The legitimate "I want fresh translations" workflow already has an explicit path (`REFRESH_L10N=1`) that bypasses the dist check, and `make prep-translations` direct-invocation still forces a fetch. Per code review: #513 (comment)
66f97c7 to
8b403ce
Compare
Summary
Build React Appdoes either, and every other job consumes thedist.tar.gzartifact.node-fetchin favour of Node 20's built-infetchso the translation fetch can run in parallel withnpm ci.Speedup
Compared against trunk #2251 (last clean run before this PR):
Build XCFrameworkwas on the wall-time critical path; trimmingnpm ci+ JS build + translation fetch from it accounts for most of the headline win.Root Cause
Three compounding inefficiencies:
1.
build-reactre-fetched translations every buildmake build REFRESH_L10N=1 STRICT_L10N=1defeated the Makefile's existence-based cache (Makefile:48), forcing all ~50 locales to be re-fetched from translate.wordpress.org on every CI build. Buildkite agents don't share state, so every build paid the full GlotPress round-trip cost.2. Every downstream `: build` consumer re-fetched translations and re-ran `npm ci`
test-swift-library,test-swift-package,test-android,test-android-library-e2e, andbuild-resources-xcframeworkall declarebuildas a Make prereq.build: npm-dependencies prep-translations, so both ran first. On agents that just extracted an upstreamdist.tar.gz:dist/is populated →build's recipe short-circuited the JS buildsrc/translations/is empty (not indist.tar.gz) →prep-translationsran the full fetchernode_modulesis empty →npm-dependenciesrannpm cidist/already had everything baked in3. Two Android jobs rebuilt
dist/from scratchTest Android LibraryandTest Android Library Instrumentedhad no upstream dependency, so they walked the full: buildchain — fetch all locales,npm ci,npm run build, copy assets — before finally running gradle.Fix
.buildkite/pipeline.ymlBuild React App: backgroundsmake prep-translations STRICT_L10N=1while runningmake npm-dependenciesin the foreground, thenwaits on the fetch beforemake build REFRESH_JS_BUILD=1. Shell vars escaped with$$for Buildkite interpolation.Test Android LibraryandTest Android Library Instrumented: gaindepends_on: build-reactand downloaddist.tar.gz.Makefileprep-translations: gains a "skip ifdist/exists" clause. Translations are baked into the bundle at JS build time, so downstream consumers don't need a fresh fetch.REFRESH_L10N=1and direct invocation still force the fetch.build: dropsnpm-dependenciesfrom its prereq list and invokes it via$(MAKE)inside the rebuild branch. Downstream: buildconsumers (which run gradle / xcodebuild / swift, not anything fromnode_modules/.bin) no longer pay for annpm cithey don't need.npm-dependencies: drops the "direct invocation forces install" clause —build's recipe now invokes it via$(MAKE)which would otherwise trigger the force.REFRESH_DEPS=1remains as the explicit force.test-androidandtest-android-library-e2e: copydist/into Android assets unconditionally.build's recipe short-circuitscopy-dist-androidwhendist/already exists, so CI agents that download a dist artifact would otherwise have stale assets. Matches whattest-android-e2e:329-331andbuild-resources-xcframework:131already do.bin/prep-translations.jsDrop
import fetch from 'node-fetch'in favour of Node 20's built-in globalfetch. The script's surface (response.ok,response.status,response.headers.get,response.json()) is API-compatible with WHATWG Fetch in both implementations. Also dropsnode-fetchfrompackage.jsonandpackage-lock.json(98 lines of lockfile).Decision matrix for
prep-translationsdist/src/translations/REFRESH_L10Nbuild(or anything)build(or anything)prep-translationsprep-translations1What we explored
1. CI-level cache with a weekly key ❌
Cache
src/translations/with a key that rolls weekly. Faster on most builds but accepts up to a week of staleness in shipped strings. Rejected — fresh-per-build is cheap once we stop fetching per-job.2. Run
prep-translationsas a separate Buildkite step ❌The first iteration of this PR did exactly that. It bought the agent-time win but added 2:39 of serial wall-time on the critical path because the fetch waited for its own agent startup + npm install. Replaced by the in-step parallel approach (see commit
9b4e35c1).3. Swap hand-rolled fetcher for
Fastlane::Helper::GlotpressDownloaderSeparate concern — doesn't change CI speed. Tracked as a follow-up; see the release-toolkit helper.
Test plan
Library Tests,iOS Simulator Tests,Test Android Library,Build XCFramework, andTest Android Library Instrumented— confirms thedist/-based skip is firingmake prep-translationsstill fetches when invoked directly (explicit user intent)make test-swift-libraryafter a successfulmake buildlogs both "Skipping translations fetch (dist/ already built…)" and "Skipping NPM dependencies installation (node_modules already exists)" — and does not invokenpm ciSTRICT_L10N=1still fails the prep step on a real GlotPress error (not exercised by this PR's builds — verified previously)