ci: Mitigate translation download 429 errors#320
Conversation
Remove `prep-translations` from `postinstall` in package.json so that `npm ci` no longer triggers translation downloads in lint/test CI steps. Translations are already handled by `make` targets that need them. Restructure the Buildkite pipeline so that the Build React App step uploads `dist/` as an artifact, and the Publish Android Library step downloads it instead of running its own full build. This reduces translation requests from ~192 to ~48 per CI run (75% reduction).
Add STRICT_L10N=1 to the Build React App step so the build fails if translations cannot be fetched, preventing incomplete artifacts from reaching the Android publish step.
8a29b52 to
1cefb70
Compare
The translation download script was failing in CI due to HTTP 429 responses from translate.wordpress.org. The previous batch size of 10 with 1-2s retry backoffs was too aggressive — retries would also get 429'd immediately. Changes: - Reduce batch size from 10 to 5 concurrent requests - Add 2s delay between batches to spread load - Add RateLimitError class to distinguish 429s from other errors - Respect Retry-After header (delta-seconds and HTTP-date formats) - Use exponential backoff for 429s without Retry-After (5s, 10s, 20s, 40s) - Increase max retries from 3 to 5 - Add 60s max backoff safety limit to avoid blocking CI indefinitely - Make all parameters configurable via environment variables - Log configuration at startup for CI visibility
Set L10N_BATCH_SIZE=2 and L10N_BATCH_DELAY_MS=5000 on the Build React App step to match the parameters that succeeded locally. The defaults (batch=5, delay=2000ms) are still too aggressive for translate.wordpress.org in CI, where en-nz got 429'd on all 5 retry attempts in build 1346.
This reverts commit 2c5d488.
| key: build-react | ||
| command: | | ||
| make build REFRESH_L10N=1 REFRESH_JS_BUILD=1 STRICT_L10N=1 | ||
| tar -czf dist.tar.gz dist/ | ||
| buildkite-agent artifact upload dist.tar.gz |
There was a problem hiding this comment.
Upload the build artifact for later tasks to avoid unnecessary duplicate builds.
| depends_on: build-react | ||
| command: | | ||
| make build REFRESH_L10N=1 REFRESH_JS_BUILD=1 | ||
| buildkite-agent artifact download dist.tar.gz . | ||
| tar -xzf dist.tar.gz | ||
| rm -rf ./android/Gutenberg/src/main/assets/ | ||
| cp -r ./dist/. ./android/Gutenberg/src/main/assets |
There was a problem hiding this comment.
Rely upon the single build created by an earlier task.
There was a problem hiding this comment.
Expand the script robustness to include more sophisticated backoff logic when 429 errors occur.
| "generate-version": "node bin/generate-version.js", | ||
| "lint:js": "eslint . --ext js,jsx --report-unused-disable-directives --max-warnings 0", | ||
| "lint:js:fix": "eslint . --ext js,jsx --report-unused-disable-directives --max-warnings 0 --fix", | ||
| "postinstall": "patch-package && npm run prep-translations && npm run generate-version", |
There was a problem hiding this comment.
Remove the default translation downloads that occur in npm's postinstall script.
The downside is that someone running npm install rather than the project's documented recommendation of make dev-server or make build will result in absent translation strings for new clones.
The upside is that avoiding unnecessary or duplicative translation string fetches is far simpler.
| command: make build REFRESH_L10N=1 REFRESH_JS_BUILD=1 | ||
| key: build-react | ||
| command: | | ||
| make build REFRESH_L10N=1 REFRESH_JS_BUILD=1 STRICT_L10N=1 |
There was a problem hiding this comment.
Add STRICT_L10N=1 to ensure the release build fails if downloading translation strings fails. Otherwise, the release may lack expected translations.
What?
Reduces translation download requests from ~192 to ~48 per CI run by building once and sharing artifacts, removing translations from
postinstall, and adds robust 429 rate limit handling so the remaining 48 requests recover gracefully when throttled.Why?
Fix CMM-1250.
Every Buildkite CI run makes ~192 HTTP requests to
translate.wordpress.orgbecause 4 of 6 pipeline steps independently download all 48 translation locales. This causes 429 (rate limiting) errors. The pipeline had no artifact sharing — the "Build React App" and "Publish Android Library" steps each ran a fullmake build REFRESH_L10N=1, and the lint/test steps triggered downloads vianpm ci'spostinstallhook.Even after reducing to ~48 requests, builds were still failing because the download script's retry logic was too aggressive — batch size of 10 with 1-2s backoff meant retries also got 429'd immediately.
How?
1. Remove
prep-translationsfrompostinstallinpackage.jsonTranslation downloads are a build concern, not an install concern. The
maketargets (make build,make dev-server, etc.) already handle translations via theprep-translationsdependency. This eliminates the translation downloads that were happening in lint/test CI steps (which runnpm cibut don't need translations).2. Restructure
.buildkite/pipeline.ymlkey: build-reactand uploadsdist/as a Buildkite artifactnpm cino longer triggers translation downloadsdepends_on: build-react, downloads the pre-builtdist/artifact instead of running its ownmake build3. Fail build on translation fetch errors in
bin/prep-translations.jsPreviously, translation fetch failures were silently swallowed. The script now properly propagates errors so CI fails visibly rather than producing a build with missing translations.
4. Improve 429 rate limit handling in
bin/prep-translations.jsRateLimitErrorclass that carries the parsedRetry-AftervalueRetry-Afterheader (supports both delta-seconds and HTTP-date formats)Retry-After: 5s, 10s, 20s, 40s + jitterL10N_BATCH_SIZE,L10N_BATCH_DELAY_MS,L10N_MAX_RETRIES,L10N_MAX_BACKOFF_MS)Testing Instructions
dist.tar.gznpm ciwithout downloading translationsnpm citranslate.wordpress.orgare recovered via backoff and retrynpm ciand confirm translations are not downloadednpm run prep-translations -- --forceand confirm translations download with the new config logged (batch=5, delay=2000ms, retries=5, maxBackoff=60000ms)L10N_BATCH_SIZE=2 L10N_BATCH_DELAY_MS=5000 npm run prep-translations -- --forceand confirm env var overrides are respected in the config logmake dev-serverand confirm the editor works with translations