Installer: verify readiness & credentials before reporting success; RHEL-family support#171
Open
LukasWodka wants to merge 4 commits into
Open
Installer: verify readiness & credentials before reporting success; RHEL-family support#171LukasWodka wants to merge 4 commits into
LukasWodka wants to merge 4 commits into
Conversation
Docker: install docker-ce from the official Docker CentOS dnf repo on AlmaLinux/Rocky/Oracle, which get.docker.com rejects as unsupported. k3d: preserve PATH through sudo so the post-install lookup survives RHEL secure_path (which omits /usr/local/bin). conntrack: use the conntrack apt package on Debian/Ubuntu and conntrack-tools elsewhere. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ccess (#716, #717) Credentials entered at the prompt are validated against the backend api-token-auth endpoint (the same call jobs-manager makes) with a re-prompt loop, so a wrong Client ID or password is caught immediately instead of after a full deploy. After helm apply, wait_for_client_ready polls rollout status and classifies the outcome; print_summary reports connected, starting, bad_creds, image_pull or crash, and prints the data-never-leaves message only when the client is verifiably connected. Exit code now reflects the real outcome. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…716, #717) Test-Credentials, Wait-ForClientReady and Get-NotReadyState mirror the bash logic in install-k8s.ps1, and Print-Summary is now state-branched. Validated with the PowerShell 7.4 parser; runtime behavior still needs a check on a Windows host. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
Author
|
👋 Heads-up — Code review queue is at 14 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
2 tasks
scripts/tests/: 64 bats tests (summary, install-client-helm, setup-linux, common) + 32 Pester tests for install-k8s.ps1 — all mocked, no Docker/k3d/network needed. Changed-line coverage measured with kcov (bash 96.2%) and Pester (PowerShell 97.4%); residual lines are the real RHEL Docker-install commands + the guarded main() orchestration, exercised by the integration E2E. A TB_PESTER guard lets the suite dot-source install-k8s.ps1 without running the installer. New installer-tests job in helm-ci.yaml runs both suites on PRs (scripts/ added to path filters). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
Author
Tests added (follow-up — commit 5e41bd3)Added an automated test suite + a CI gate for the installer (the PR previously had only manual verification).
Changed-line coverage (measured, not estimated):
CI: new |
LukasWodka
added a commit
that referenced
this pull request
Jun 1, 2026
… 0.0.0.0 detect, Windows parity) A customer running behind an authenticated corporate HTTP proxy hit install failures. The 0.0.0.0 kubeconfig headline was fixed for the bash path in #166/#167, but adverse testing (a forward proxy + real k3d on Linux VMs) surfaced three remaining gaps plus a Windows parity hole: - Gap A: authenticated proxies (http://user:pass@host) were silently SKIPPED — k3d's --env KEY=VALUE@FILTER can't carry an '@' in the value. Now propagated via a k3d --config file (structured YAML env) so credentials survive intact. Verified on k3d v5.8.3 (it merges the --config env with the existing CLI flags). - Gap B: NO_PROXY was propagated verbatim. Now auto-augmented with the cluster-internal ranges (loopback + RFC1918 + .svc/.cluster.local + host.k3d.internal), both into the cluster and host-side, so in-cluster traffic never routes through the proxy — fixes the misroute AND the observed `k3d cluster create --wait` hang. - Gap C: a cluster created outside the installer and bound to 0.0.0.0 is now detected (serverlb HostIp) and flagged with a non-destructive recreate remedy. - Windows parity: install-k8s.ps1::New-K3dCluster had NONE of the bash fixes — it still bound --api-port 0.0.0.0:6550 (the original headline bug, still live on Windows), normalized only host.docker.internal in the kubeconfig, and propagated zero proxy env. Now mirrors bash: 127.0.0.1:6550, a 0.0.0.0->127.0.0.1 kubeconfig rewrite, and Get-EffectiveNoProxy + Write-K3dProxyConfig (auth + augmented NO_PROXY, written UTF-8 without a BOM). Tests: new scripts/tests/cluster.bats (15) + Pester for the two ps1 helpers (6), both green. Verified end-to-end on Linux VMs: auth creds propagated into the node, no startup hang behind an unreachable proxy, and 0.0.0.0 detection firing. Stacked on #171 (the installer test scaffolding + final install-k8s.ps1 live there). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This was referenced Jun 1, 2026
LukasWodka
added a commit
that referenced
this pull request
Jun 1, 2026
…eal preflight probes) Measured changed-line coverage of the stack had dropped (bash ~84% vs #171's ~96%) because the new code added integration-only branches the mocked unit suites skipped. Recover the unit-testable portion: - diagnose.bats: exercise the kubectl/docker/helm collection path (has()=true + mocked tools) -> diagnose.sh 64% -> 90%. - preflight.bats: test the REAL _pf_probe_url curl-exit-code -> token mapping, the missing-curl path, and the _pf_ncpu/_pf_total_mem_kb/_pf_free_kb readers (re-sourced past the setup stubs) -> preflight.sh 79% -> 90%. Bash changed-line coverage: 83.6% -> 92.3% (kcov, 383/415). The residual ~8% is integration-only (real k3d/docker create + macOS/Windows-specific branches + MAIN orchestration), validated by the live VM E2Es (reboot recovery, auth-proxy, preflight blocked-egress/arm64, diagnose redaction grep). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Running the curl/irm installer on real Ubuntu 24.04 / AlmaLinux 9 / CentOS Stream 9 VMs + macOS (and tracing the Windows path) surfaced two classes of problems for the people who actually run it — clinical researchers and hospital IT:
helm upgrade --installruns with no--wait,verify_clusteronly logged, andprint_summaryprinted "installed successfully — your data never leaves" unconditionally — while pods were stillContainerCreating/CrashLoopBackOff. Wrong credentials produced the same green screen, because credentials were never validated.get.docker.comrejects them), and k3d install died on all RHEL-family incl. CentOS/RHEL (sudosecure_pathomits/usr/local/bin). Plus a wrongconntrackpackage name on Debian/Ubuntu.This makes the installer only claim success when the client is verifiably connected, verify credentials at entry, and actually complete on Ubuntu, RHEL-family, macOS, and Windows.
Closes #716, #717, #718, #719, #720.
Changes
Verify before reporting — bash + PowerShell (#716, #717)
api-token-auth/endpoint — the same calljobs-managermakes (client-runtime/controller.py) — via the already-presentcurl(Invoke-WebRequeston Windows), with a re-prompt loop. A wrong Client ID/password is caught immediately instead of after a full deploy. Resolves the backend perCLIENT_ENV(dev/stg/prod).helmapplies, a readiness gate (wait_for_client_ready/Wait-ForClientReady) waits onkubectl rollout statusand classifies the outcome; the summary reports connected / starting / bad_creds / image_pull / crash. The "secure compute environment / your data never leaves" line prints only when verifiably connected. The exit code now reflects reality.RHEL-family support — bash (#718, #719, #720)
install_docker_engine: installdocker-cefrom the official Docker CentOS dnf repo on AlmaLinux/Rocky/Oracle (get.docker.com rejects these as "Unsupported distribution").install_k3d: run the k3d script viasudo env "PATH=$PATH"so its post-installcommand -v k3dsurvives RHELsecure_path(which omits/usr/local/bin).install_system_deps: use theconntrackpackage on Debian/Ubuntu,conntrack-toolselsewhere.Type
Bug fix + reliability. No chart changes.
Test plan — verified this session
_backend_urlmaps prod/dev/stg; live backend rejects bad creds →invalid✅connected);_diagnose_not_readyreturnsbad_credsagainst a real crash-looping release (jobs-manager, 45 restarts) ✅secure_pathwithout/usr/local/bin), the new method installs k3d v5.8.3 where stocksudo bashfailed ✅ID="almalinux"matches the clone branch; the dnf-repo method installs Docker ✅conntrackis a valid apt package;conntrack-toolsis not ✅bash -nclean on all scripts;install-k8s.ps1parses clean under PowerShell 7.4.The PowerShell changes (
install-k8s.ps1) mirror the bash logic and pass the PS parser, but I could not execute them (no Windows host here). Please validate on Windows + WSL2 (admin PowerShell): the credential re-prompt loop, the readiness gate, and the state-branched summary.Follow-ups (out of scope)
Moving credential collection ahead of cluster build (fail in ~2s); the broader items from the review (cluster-admin auto-upgrade RBAC, egress allowlist docs, OVA/air-gapped appliance).
🤖 Generated with Claude Code