Skip to content

fix(#166, #168): installer survives corporate proxy + sanitizes credential input#167

Merged
saadqbal merged 10 commits into
developfrom
fix/166-installer-corporate-proxy
Jun 1, 2026
Merged

fix(#166, #168): installer survives corporate proxy + sanitizes credential input#167
saadqbal merged 10 commits into
developfrom
fix/166-installer-corporate-proxy

Conversation

@saadqbal
Copy link
Copy Markdown
Contributor

@saadqbal saadqbal commented Jun 1, 2026

Summary

Two installer-side fixes bundled together (both surfaced while validating the corporate-proxy fix):

  1. #166i.sh fails behind corporate HTTP/HTTPS proxies (Charité install, P1 customer blocker). Four bugs in scripts/lib/cluster.sh.
  2. #168 — Helm rejects the generated values.yaml with "control characters are not allowed" when bracketed-paste-enabled terminals leak ESC sequences into clientId/clientPassword. One bug in scripts/lib/install-client-helm.sh.

#166 — corporate-proxy fixes (scripts/lib/cluster.sh)

  1. k3d API bound to 0.0.0.0 — corporate proxies intercept the 0.0.0.0 connection and return Temporary Redirect to kubectl. Pin explicitly to 127.0.0.1:6550.
  2. No proxy propagation to k3d containersHTTP_PROXY/HTTPS_PROXY/NO_PROXY on the host weren't passed into k3s. Loop over both cases; skip vars whose value contains @ (k3d's filter-syntax conflict — recreate wouldn't help).
  3. _merge_kubeconfig clobbered manual sed fix on re-run — defensive https://0.0.0.0:https://127.0.0.1: rewrite after the merge. Targets the file k3d actually writes to (first entry of colon-separated KUBECONFIG, or ~/.kube/config).
  4. Misleading "Check Docker" error — replaced with a three-cause diagnostic naming the real culprits, with the ${KUBECONFIG} path in the workaround hint and macOS-compatible sed -i.bak form.

Also adds _check_existing_cluster_proxy: docker inspects the server container on re-run and warns per-var (including NO_PROXY drift) when the cluster is missing proxy env the host has. @-containing values are excluded since recreate wouldn't bake them either.

#168 — credential sanitization (scripts/lib/install-client-helm.sh)

  • New _sanitize_credential strips C0 controls (0x00-0x1F) + DEL (0x7F) from TB_CLIENT_ID / TB_CLIENT_PASSWORD after every read. UTF-8 bytes (0x80+) preserved so international passwords work. Warns the user when anything was stripped.
  • _extract_yaml_value also strips on read so a previously-corrupted values.yaml self-heals on the next install (without it, the default-fill on re-run reproduces the bad value).

Chart version

client/Chart.yaml bumped 1.4.1 → 1.4.2 per #166 acceptance criteria.

Backward compatibility

  • Bug 1: localhost-only API bind. Only affects users connecting to k3d's API from another host on their LAN — vanishingly rare for a dev-machine installer. Existing kubeconfigs with 0.0.0.0:6550 get auto-normalized on the next run.
  • Bug 2: pure additive. Loop guarded by [[ -n "${!var:-}" ]] — strict no-op when no proxy is set.
  • Bug 3: sed pattern anchored to https://0\.0\.0\.0: — won't touch CIDR ranges. sed -i.bak works on BSD + GNU.
  • Bug 4: cosmetic text only.
  • bug: installer values.yaml rejected by Helm — control chars from credential paste #168 sanitization: strips only non-printable C0 + DEL. Valid credentials (printable ASCII + UTF-8) pass through unchanged.

Validation

  • bash -n clean on both touched files.
  • shellcheck -x -s bash shows only two pre-existing SC2034 warnings on cluster.sh (unused logfile, attempt — present before this PR).
  • Verified no other repo file encodes 0.0.0.0:6550 (CIDRs in NetworkPolicy templates unchanged).

Test plan

Follow-ups (deferred per issues)

Closes #166
Closes #168

🤖 Generated with Claude Code


Note

Medium Risk
Changes affect every install path (k3d API bind, kubeconfig rewrite, proxy env) though defaults stay localhost-only when no proxy is set; credential stripping could alter unusual passwords that relied on control characters.

Overview
The installer and chart release are updated so installs work behind corporate HTTP proxies and credential prompts no longer poison values.yaml.

cluster.sh binds the k3d API to 127.0.0.1:6550, forwards host HTTP_PROXY / HTTPS_PROXY / NO_PROXY (both cases) into new clusters via k3d --env, and warns when proxy URLs contain @ or when an existing cluster’s env drifts from the host. After kubeconfig merge, https://0.0.0.0: server URLs are rewritten to 127.0.0.1; API wait failures now point at Docker, proxy NO_PROXY, and kubeconfig fixes.

install-client-helm.sh strips bracketed-paste / CSI escapes and C0 controls from client ID/password (and when re-reading values.yaml) so Helm no longer rejects “control characters.”

client/Chart.yaml is 1.4.1 → 1.4.2; Helm CI also runs on develop.

Reviewed by Cursor Bugbot for commit 0eab8ed. Bugbot is set up for automated code reviews on this repo. Configure here.

Four bugs caused i.sh to fail behind corporate proxies (surfaced during
a Charité install — P1 customer blocker):

1. k3d --api-port defaulted to 0.0.0.0:6550 — corporate proxies
   intercept the 0.0.0.0 connection. Pin explicitly to 127.0.0.1:6550.
2. HTTP_PROXY/HTTPS_PROXY/NO_PROXY never propagated to k3d containers,
   so k3s couldn't reach external registries. Propagate when set
   (no-op when unset — fully backward compatible).
3. _merge_kubeconfig clobbered manual sed fixes on every re-run.
   Apply a defensive 0.0.0.0 → 127.0.0.1 normalization after the merge,
   anchored to https://0.0.0.0: so CIDR ranges are untouched.
4. _wait_for_api error message blamed Docker when the real cause was
   the API being unreachable. Rewrite with the three actual causes.

Also bump client chart to v1.4.2.

Closes #166

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@saadqbal saadqbal self-assigned this Jun 1, 2026
@LukasWodka
Copy link
Copy Markdown
Contributor

👋 Heads-up — Code review queue is at 11 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

Comment thread scripts/lib/cluster.sh
Comment thread scripts/lib/cluster.sh
saadqbal and others added 2 commits June 1, 2026 12:10
BSD sed (macOS) requires an argument to -i; the previous workaround
text would error out for the audience most likely to see this message
(local dev installs on macOS). Use the `sed -i.bak ... && rm ...bak`
form, which works on both BSD and GNU sed — same form the script itself
uses for the Bug 3 normalization.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two bugbot findings on PR #167:

1. (high) k3d's --env KEY=VALUE@NODEFILTER syntax breaks when the proxy
   value itself contains '@' (e.g. embedded creds: user:pass@host) —
   k3d cluster create fails before the cluster exists. Detect '@' in
   the value, skip propagation for that var, and warn the user with
   actionable guidance (strip creds, or configure proxy auth inside
   the cluster).

2. (medium) Proxy env was only injected on fresh cluster creation;
   re-runs against an existing cluster left registries unreachable
   because k3d bakes --env into containers at create time. Add
   _check_existing_cluster_proxy: docker-inspect the server container,
   compare against host proxy env, warn the user to delete+recreate
   when the cluster is missing proxy env. Silent no-op when Docker is
   down or no proxy env is set on the host.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread scripts/lib/cluster.sh Outdated
Comment thread scripts/lib/cluster.sh
Two more bugbot findings on 8a12455:

1. (medium) The Bug 4 error message hardcoded ~/.kube/config in the
   (c) sed-workaround hint, but _merge_kubeconfig honors $KUBECONFIG.
   Users with a custom KUBECONFIG were being directed to the wrong
   file. Use ${KUBECONFIG:-~/.kube/config}, and if it's colon-separated
   (multi-file), point at the first entry.

2. (medium) _check_existing_cluster_proxy only checked HTTP/S_PROXY in
   the cluster env, so a partial mismatch (cluster has HTTP_PROXY but
   not NO_PROXY) silently passed the gate. Switch to a per-var check:
   for each proxy var set on the host, verify the cluster has it, and
   list specifically which ones are missing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread scripts/lib/cluster.sh
Comment thread scripts/lib/cluster.sh
…malize

Two consistency bugs introduced by earlier follow-ups on 5ad0c96:

1. (medium) _check_existing_cluster_proxy flagged any host proxy var
   missing from the cluster as drift — including '@'-containing values
   that _create_new_cluster deliberately skips. The advice ("delete +
   recreate") wouldn't bake them either; the warning was lying. Filter
   '@' values out of the drift list, matching what propagation actually
   does. Users with embedded-creds proxies already saw the skip warning
   at first-create time.

2. (medium) _merge_kubeconfig's defensive normalization checked
   `-f "$KUBECONFIG"`, which is false for a colon-separated path list.
   k3d's --kubeconfig-merge-default writes into the first entry of
   KUBECONFIG (or ~/.kube/config if unset), so the rewrite needs to
   target that file. Resolve with ${kc%%:*}, same idiom we use in the
   _wait_for_api error hint.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
aptracebloc
aptracebloc previously approved these changes Jun 1, 2026
Helm rejects values.yaml with 'control characters are not allowed' when
the user's terminal has bracketed-paste enabled and they paste creds:
the paste is wrapped in ESC[200~ ... ESC[201~ (ESC = 0x1B), which `read`
captures verbatim and the heredoc embeds into clientId/clientPassword.

Surfaced today during an EC2 install:
  Error: failed to parse /home/ec2-user/.tracebloc/values.yaml:
  error converting YAML to JSON: yaml: control characters are not allowed

Fix:
- New _sanitize_credential helper strips 0x00-0x1F and 0x7F from the
  string. UTF-8 bytes (0x80+) preserved so international passwords work.
- Apply after every TB_CLIENT_ID / TB_CLIENT_PASSWORD read, before the
  empty check and before the YAML single-quote escape.
- Also strip in _extract_yaml_value so a previously-corrupted values.yaml
  self-heals on the next install instead of perpetuating the bad value.
- Warn the user when anything was stripped so they know we touched their
  input.

Closes #168
Independent of #166 but bundled here at the user's request.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@saadqbal saadqbal changed the title fix(#166): installer survives corporate HTTP/HTTPS proxy fix(#166, #168): installer survives corporate proxy + sanitizes credential input Jun 1, 2026
saadqbal and others added 2 commits June 1, 2026 14:16
Two regressions in c237721:

1. `warn` in _sanitize_credential wrote to STDOUT, but the function is
   called from `TB_CLIENT_PASSWORD=$(_sanitize_credential "$PW")` —
   command substitution captures stdout, so the warning message landed
   inside the password value and ended up in values.yaml verbatim.
   Surfaced when a customer's clientPassword became literally:
     '  ⚠  Stripped non-printable characters from input...
     [CrypveP-sumwac-5siqra'

   Fix: warn explicitly to >&2 in this caller. Keeps the global warn()
   behavior unchanged (other callers print to stdout intentionally).

2. The C0-strip removed the ESC byte but left the literal `[200~` /
   `[201~` bracketed-paste markers visible in the value. Add a shared
   _strip_paste_garbage helper that matches both the ESC-prefixed form
   and the standalone-marker form, then C0-strips the remainder. Use it
   from both _sanitize_credential (input path) and _extract_yaml_value
   (defaults-from-prior-file path) so self-heal works against either
   corruption shape.

Verified with bash unit tests against four input shapes (full wrapper,
literal-marker remnant, clean input, capture-isolation check).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Customer report: password 'rypveP-sumwac-5siqra' stored as
'[CrypveP-sumwac-5siqra'. The leading '[C' is the post-C0-strip
remnant of ESC[C (cursor-right arrow key) — a stray arrow key press
during the password prompt. d22a381 only matched ESC[200~/ESC[201~
(bracketed paste), so generic CSI sequences still leaked through.

Replace the narrow paste-marker strip with an iterative match over the
full ANSI CSI shape:  ESC '[' <params> <final-byte>
where params ∈ [0-9;] and final ∈ [A-Za-z~]. Catches arrow keys,
cursor moves, function keys, modifier-combo CSIs, and bracketed paste
— anything `read -r -s` can capture from a terminal.

Keep the standalone-marker self-heal narrow (only [200~/[201~);
generic `[X]` shapes could plausibly be real password content.

Verified against eight input shapes:
  bracketed-paste wrapper, cursor-right prefix, mid-string arrow keys,
  multi-param CSI (Ctrl-Right), orphan markers, clean ASCII (no-op),
  UTF-8 preservation, warn-to-stderr isolation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
aptracebloc
aptracebloc previously approved these changes Jun 1, 2026
The Helm Chart CI workflow (Lint, Template render, Unit tests, Schema
validation) only triggered on main- and openshift-targeted PRs. The
required-check ruleset on develop still expects the `Lint` context, so
develop-targeted PRs land in "Expected — Waiting for status to be
reported" forever and can't merge.

Adding `develop` to both push and pull_request branch filters fixes
this PR's merge block and closes the gap going forward — chart issues
will be caught at the develop integration point rather than slipping
through until the develop→main release sync.

Paths filter unchanged: only PRs that touch client/**, ingestor/**, or
this workflow file itself trigger CI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9257a74. Configure here.

Comment thread scripts/lib/cluster.sh Outdated
aptracebloc
aptracebloc previously approved these changes Jun 1, 2026
Previous round filtered '@'-containing host proxy vars out of the drift
check (correctly — recreating wouldn't bake them). But if the host has
ONLY @-style proxy URLs, the function went silent on every re-run. The
create-time warn in _create_new_cluster doesn't fire on the existing-
cluster path, so the user got zero installer signal that image pulls
might fail.

Partition host vars into two buckets:
  • drift_candidates — clean values; cluster should have these
  • at_skipped       — @-values; recreate alone won't bake them

@-skipped vars now get their own dedicated warning that fires on every
re-run, with remedies that actually work (strip creds, or front the
corporate proxy with a credential-less local auth-proxy). Drift check
runs unchanged against the clean-value bucket.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@saadqbal saadqbal merged commit b6b077b into develop Jun 1, 2026
11 checks passed
saadqbal added a commit that referenced this pull request Jun 1, 2026
…oxy (#169)

fix(#166, #168): installer survives corporate proxy + sanitizes credential input
LukasWodka added a commit that referenced this pull request Jun 1, 2026
… 0.0.0.0 detect, Windows parity)

A customer running behind an authenticated corporate HTTP proxy hit install
failures. The 0.0.0.0 kubeconfig headline was fixed for the bash path in
#166/#167, but adverse testing (a forward proxy + real k3d on Linux VMs)
surfaced three remaining gaps plus a Windows parity hole:

- Gap A: authenticated proxies (http://user:pass@host) were silently SKIPPED —
  k3d's --env KEY=VALUE@FILTER can't carry an '@' in the value. Now propagated
  via a k3d --config file (structured YAML env) so credentials survive intact.
  Verified on k3d v5.8.3 (it merges the --config env with the existing CLI flags).
- Gap B: NO_PROXY was propagated verbatim. Now auto-augmented with the
  cluster-internal ranges (loopback + RFC1918 + .svc/.cluster.local +
  host.k3d.internal), both into the cluster and host-side, so in-cluster traffic
  never routes through the proxy — fixes the misroute AND the observed
  `k3d cluster create --wait` hang.
- Gap C: a cluster created outside the installer and bound to 0.0.0.0 is now
  detected (serverlb HostIp) and flagged with a non-destructive recreate remedy.
- Windows parity: install-k8s.ps1::New-K3dCluster had NONE of the bash fixes —
  it still bound --api-port 0.0.0.0:6550 (the original headline bug, still live
  on Windows), normalized only host.docker.internal in the kubeconfig, and
  propagated zero proxy env. Now mirrors bash: 127.0.0.1:6550, a 0.0.0.0->127.0.0.1
  kubeconfig rewrite, and Get-EffectiveNoProxy + Write-K3dProxyConfig (auth +
  augmented NO_PROXY, written UTF-8 without a BOM).

Tests: new scripts/tests/cluster.bats (15) + Pester for the two ps1 helpers (6),
both green. Verified end-to-end on Linux VMs: auth creds propagated into the node,
no startup hang behind an unreachable proxy, and 0.0.0.0 detection firing.

Stacked on #171 (the installer test scaffolding + final install-k8s.ps1 live there).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants