Skip to content

Helm chart auto-update — deployed clients stay frozen on installed version #69

@LukasWodka

Description

@LukasWodka

Problem

When a customer deploys the tracebloc client via helm install, their cluster is pinned to whatever chart version was current at install time. Once the workspace is running, there is no mechanism to:

  • Poll the published Helm repo (https://tracebloc.github.io/client) for newer chart versions
  • Notify the operator when a new version is available
  • Apply chart upgrades automatically

The image-tag layer inside the chart partially handles in-place updates: when images.jobsManager.digest is unset, imagePullPolicy: Always re-pulls the floating prod/dev tag — but only when a pod restarts, and pods do not restart on their own. So image updates are also bottlenecked behind a chart upgrade trigger that never fires.

Net result: customers sit on the version they first installed (e.g. client 1.0.8) indefinitely, missing security fixes, MySQL stability fixes (v1.1.0), and other improvements.

Where

  • tracebloc/client Helm chart (client/)
  • Install entry: scripts/lib/install-client-helm.sh runs helm repo update && helm upgrade --install once at install time only — never again

Fix

Ship a chart-side mechanism that closes the loop. Headline approach: a CronJob inside the chart that:

  1. Periodically (e.g. daily) runs helm repo update against tracebloc.github.io/client
  2. Compares the published chart version against the deployed Release version
  3. Either notifies the operator (workspace UI banner / email / kubectl describe) or self-applies, based on the chosen UX
  4. Reports the result back to the tracebloc backend so the version is visible in the workspace UI

Options

  • (A) Notify-and-approve — CronJob detects a new version, surfaces it in the workspace UI / emails the admin, customer clicks "apply". Safer for breaking changes; aligns with how enterprise infrastructure typically handles updates.
  • (B) Auto-upgrade — CronJob runs helm upgrade directly when a new chart is detected. Lowest friction, highest risk for breaking-change rollouts on customer infra.
  • (C) Hybrid — auto-apply patch versions (1.1.0 → 1.1.1), notify-and-approve for minor/major bumps. Closest to standard SaaS update patterns.

Why

Every existing customer is silently drifting away from the latest secure, stable version. Security work (training-pod hardening from #57, the security-docs branch) and stability fixes (MySQL kill-loop fix in v1.1.0) only land for new deployments. There is no remediation channel for already-deployed clients short of asking each customer to run helm upgrade manually.

Benefits

  • Existing customers receive security and stability fixes without manual operator action
  • Removes the support burden of nudging customers to upgrade after every release
  • Visibility — backend knows what version each customer is running, enabling distribution dashboards
  • Foundation for a maintenance / SLA story in enterprise sales conversations

Improvement

After this lands, every running client either auto-upgrades or shows a clear "new version available" prompt within ~24h of a release. Backend can also report client-version distribution (how many customers on 1.0.8 vs 1.1.0).

Open decisions before implementation

  • UX: notify-and-approve vs auto-upgrade vs hybrid
  • Where the notification surfaces: workspace UI in frontend-app (needs backend support too), email, or both
  • Cadence: daily / hourly / Helm hook
  • Priority label

Status

Open

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions