ci: shell-quote env values in system-integration workflow#536
Merged
Conversation
Two bugs in the env file step prevented the weekly System Integration Tests from ever passing since #427 (2026-01-26): 1. The deletion regex `/^KEY=/` did not match the template's `export KEY=replace-me` lines, leaving stale `replace-me` placeholders in the file alongside the appended real values. 2. Appended values were not shell-quoted, so `source ./env` in deployment/crs-architecture.sh treated whitespace and metacharacters as shell syntax. The `OTEL_TOKEN` secret (a `Bearer <token>` header) tripped this with `command not found` on the token component. The same gap was a code-execution sink: a secret containing `$(...)` or backticks would execute on the runner at source time. Fix: strip with `(export[[:space:]]+)?` so template lines are actually removed, and write with `printf 'export %s=%q\n'` so every value is shell-escaped — defending against both whitespace and injection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The placeholder lines (`export KEY=replace-me`) only existed because the pre-rewrite step substituted into them with `sed -i "s|KEY=.*|...|"`. The new `printf %q` write path doesn't need them, so removing them at the source eliminates the entire delete-then-rewrite dance. Drops `strip_var` and the for-loop. The workflow step is now: copy the static template, then append shell-quoted assignments for every secret the env: block injects. GHCR_AUTH is no longer defined at all, so the `[ -n "$GHCR_AUTH" ]` warning branch in crs-architecture.sh:63 fires as intended. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ret2libc
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The weekly System Integration Tests workflow has been failing on every Sunday cron since 2026-02-01 — every run we have history for. The job dies in the very first deploy step:
Root cause
Two compounding bugs in the
Configure env file for minikubestep:Deletion regex never matched. The step ran
sed -i "/^KEY=/d" envfor each secret key, butci-env.templatehas every line prefixed withexport(e.g.export OTEL_TOKEN=replace-me). The anchor^OTEL_TOKEN=never matchesexport OTEL_TOKEN=, so the placeholder lines stayed in the file alongside the appended real values. For most secrets (plain alphanumeric) this was harmless because the second assignment shadowed the first atsourcetime.Appended values were not shell-quoted.
deployment/crs-architecture.sh:16doessource ./env, so each line is parsed as bash. The step appended values viaecho "OTEL_TOKEN=${OTEL_TK}", with no quoting. TheOTEL_TOKENsecret is aBearer <token>header — the space causes bash to parseOTEL_TOKEN=Beareras the assignment and then attempt to execute<token>as a command. Hencecommand not foundand exit 127.The same gap was a code-execution sink: a secret containing
$(...)or backticks would have run on the runner atsourcetime. Reproduction confirmedOTEL_TK="$(touch /tmp/PWNED)"produced the file under the previous code.Fix
Two changes that reinforce each other:
.github/ci-env.template— drop the entire# Secrets/Env replacedblock. Thoseexport KEY=replace-melines only existed because the pre-rewrite approach usedsed -i "s|KEY=.*|KEY=...|", which needed substitution targets. The current approach doesn't, so they're vestigial. Removing them eliminates the whole delete-then-rewrite dance and the regex bug class with it..github/workflows/system-integration.yml— write each secret withprintf 'export %s=%q\n'.%qproduces a value that bash can re-parse as the literal string, regardless of whitespace, quotes,$(), backticks, or newlines.After the change, the step is:
GHCR_AUTHis no longer defined anywhere, so[ -n "$GHCR_AUTH" ]incrs-architecture.sh:63takes the warning-only else branch as intended (Docker pulls from a public ghcr.io image set in this CI configuration, no authentication required).Verification
actionlint(with embedded shellcheck) clean on both changed files.source ./envsucceeds, value reads back intact, no leftoverreplace-melines,GHCR_AUTHcorrectly unset.sourcefails withunexpected EOFandreplace-meplaceholders remain in the file (matches production failure pattern).$(touch ...)executed the command onsource; under the new code it is escaped and stored as a literal string.Test plan
workflow_dispatchto confirmmake deployproceeds past the env-source step before relying on the next Sunday cron.🤖 Generated with Claude Code