Skip to content

Sync develop → main for v1.4.0 chart release#157

Merged
saadqbal merged 3 commits into
mainfrom
sync/develop-to-main-v1.4.0
May 25, 2026
Merged

Sync develop → main for v1.4.0 chart release#157
saadqbal merged 3 commits into
mainfrom
sync/develop-to-main-v1.4.0

Conversation

@saadqbal
Copy link
Copy Markdown
Contributor

@saadqbal saadqbal commented May 25, 2026

Summary

Brings the image-refresh CronJob feature to main ahead of cutting the v1.4.0 release tag.

What's included

Together they deliver the closure for #154: existing customers' jobs-manager (and pods-monitor) deployments now auto-refresh on Docker Hub tag publishes with no manual kubectl rollout restart needed. Source of truth is a tracebloc.io/last-refreshed-<image>-digest annotation on the deployment, comparing against the registry's HEAD digest each tick.

Dev verification

Smoke-tested end-to-end on tb-client-dev-templates EKS:

Test Expected Actual
Fresh helm upgrade from 1.3.5 → 1.4.0 with --reset-then-reuse-values new image-refresh resources install, nothing else churns
First image-refresh tick (no annotation) records both digests, no restart
Second tick (annotation matches) digest unchanged; no-op, no annotate, no rollout
Forced annotation clobber (simulates new image push) digest changed → restart needed, rollout, then re-annotate to current digest
Rollout history bumped by 1 only on the forced-change test

After merging

Cut v1.4.0 release tag from main — that fires release-helm-chart.yaml which publishes client-1.4.0.tgz to gh-pages. Existing customers' autoUpgrade CronJobs pick it up at the next :23 tick.

Closes #154

🤖 Generated with Claude Code


Note

Medium Risk
The CronJob can restart the jobs-manager deployment and holds patch RBAC on it; blast radius is one deployment in one namespace, but unintended restarts or registry/API failures could cause brief outages.

Overview
Releases the Helm chart at v1.4.0 and adds an image-refresh path so jobs-manager (and pods-monitor via the same deployment) can pick up new Docker Hub images under the floating CLIENT_ENV tag without manual kubectl rollout restart.

A scheduled CronJob runs a shell script that HEADs Docker Hub for manifest digests, compares them to tracebloc.io/last-refreshed-*-digest annotations on the deployment (not pod imageID), records digests on first run without restarting, and on change runs kubectl rollout restart then rollout status before updating annotations. Rendering is gated by tracebloc.imageRefreshEnabled (off when disabled or both images are digest-pinned); namespace-scoped RBAC limits the job to patch/watch that deployment. New imageRefresh values and schema defaults, plus helm-unittest coverage for script/RBAC contracts and upgrade safety when imageRefresh is missing.

Reviewed by Cursor Bugbot for commit 15e78d9. Bugbot is set up for automated code reviews on this repo. Configure here.

saadqbal and others added 3 commits May 25, 2026 12:57
feat(#154): auto-refresh jobs-manager image on Docker Hub publish
…tion (#156)

Caught in dev-cluster smoke test on tb-client-dev-templates.

`kubectl get -o jsonpath="{.metadata.annotations['key.with.dots']}"`
returns empty for keys containing `.` or `/` on kubectl-go 1.30.x —
verified directly: `kubectl get ... -o jsonpath='{.metadata.annotations}'`
showed `tracebloc.io/last-refreshed-jobs-manager-digest:sha256:f913...`
present and persisted, but the script's bracket-notation read returned
empty. Effect: every tick saw `recorded=<unset>` and re-entered the
first-observation path, re-annotating the deployment with the same
digest on every run. No spurious restarts (first-observation skips
restart), but the script was incapable of ever transitioning to the
"unchanged; no-op" path, and a real image update would have been
indistinguishable from first observation.

Fix: read via `kubectl get -o json | jq -r --arg k "$_key"
'.metadata.annotations[$k] // empty'`. alpine/k8s ships jq, so this
adds no new dependency. The dot-escape jsonpath form
(`.tracebloc\.io/...`) also works but is fragile against future
kubectl version changes; jq's behaviour is locked.

Regression tests:
* The script must include `jq -r --arg k` (positive match).
* The script must NOT include `annotations['$_key']` bracket-notation
  jsonpath (negative match — the regression vector itself).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Brings the image-refresh CronJob feature to main for release:
- #155: feat(#154) auto-refresh jobs-manager image on Docker Hub publish
- #156: fix(#154) read annotations via jq (kubectl jsonpath bracket
  notation returns empty for keys containing dots/slashes)

Verified end-to-end on the tb-client-dev-templates EKS dev cluster:
fresh upgrade installs the new resources cleanly, first-tick records
both annotations without restart, second tick is a no-op, and a forced
digest mismatch triggers the expected rollout-restart-then-annotate
sequence. Rollout history bumps as expected.

Closes #154.
@saadqbal saadqbal self-assigned this May 25, 2026
@LukasWodka
Copy link
Copy Markdown
Contributor

👋 Heads-up — Code review queue is at 17 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

@waqaskhanroghani waqaskhanroghani self-requested a review May 25, 2026 09:19
@saadqbal saadqbal merged commit ccbfb44 into main May 25, 2026
15 of 16 checks passed
@saadqbal saadqbal added the skip-fr-gate Bypass FR gate for this PR (use only for bootstrap or emergencies — visible in audit) label May 25, 2026
@saadqbal
Copy link
Copy Markdown
Contributor Author

Adding skip-fr-gate label to bypass the FR gate's self-check on this sync PR.

The gate checks each referenced PR (#155, #156) AND the current PR (#157). Both #155 and #156 are correctly at Ready for prod after the /fr-pass commands. But the kanban automation has moved #157 itself to Prod (the natural state for a release-sync PR), so the gate's required = Ready for prod check fails on #157 by being too far along.

This is the gate's own documented emergency override — audit-logged via the ::warning::FR gate bypassed via 'skip-fr-gate' label. line in the workflow output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-fr-gate Bypass FR gate for this PR (use only for bootstrap or emergencies — visible in audit)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-refresh jobs-manager and pods-monitor images on Docker Hub publish

3 participants