Skip to content

Auto-refresh ingestor image digest without requiring a chart release #158

@saadqbal

Description

@saadqbal

Problem

Today, shipping a new `ghcr.io/tracebloc/ingestor` image requires a full `client` chart release:

  1. Push new image to GHCR (get digest)
  2. Bump `images.ingestor.digest` in `client/values.yaml`
  3. Bump `client/Chart.yaml` version
  4. PR → develop → sync to main → cut release tag
  5. autoUpgrade picks up the new chart at next `:23` tick → jobs-manager redeploys with new `INGESTOR_IMAGE_DIGEST`

That's ~hours of overhead per ingestor-image bump, vs. ~15 min for jobs-manager (which uses image-refresh on a floating tag). The asymmetry hurts when the ingestor changes frequently.

Proposed solution

Extend the existing image-refresh CronJob (shipped in #155 / v1.4.0) to handle the ingestor image as a third entry in its image list, but with different semantics from jobs-manager/pods-monitor:

  • jobs-manager / pods-monitor: registry-HEAD vs annotation; on change → `kubectl rollout restart` + update annotation.
  • ingestor: registry-HEAD vs annotation; on change → `kubectl set env deployment/-jobs-manager -c api INGESTOR_IMAGE_DIGEST=` + update annotation. The env change triggers a natural rollout (deployment spec mutates, new ReplicaSet rolls out).

Audit trail is preserved: the digest still flows through the deployment's env (still inspectable via `kubectl get deployment -o yaml`), and the `tracebloc.io/last-refreshed-ingestor-digest` annotation records every successful refresh — same auditability as the current pin-in-values approach, just sourced from the registry instead of the chart.

Why this is a separate ticket from #154

#154 closed with the deliberate "ingestor is out of scope" decision because the ingestor's post-install Job hook couldn't be `rollout restart`ed. This ticket sidesteps that — we're refreshing the IMAGE the spawned ingestion Jobs USE, not the hook itself. The hook stays untouched; the parent jobs-manager deployment gets the env-var patch.

Acceptance criteria

  • New value `images.ingestor.tag` (default `prod`) — a floating tag to poll on GHCR.
  • Extend image-refresh script: add ingestor entry, switch its action from `rollout restart` to `kubectl set env`.
  • GHCR auth handshake (different from Docker Hub — `auth.ghcr.io` / `ghcr.io/v2/...`). Anonymous for public images.
  • Annotation `tracebloc.io/last-refreshed-ingestor-digest` on the jobs-manager deployment.
  • First-observation contract: missing annotation → record without restarting (same as Auto-refresh jobs-manager and pods-monitor images on Docker Hub publish #154).
  • Per-image opt-out: when `images.ingestor.digest` is explicitly set in values, skip auto-refresh (operator opted into reproducibility-by-pin).
  • When BOTH ingestor + jobs-manager + pods-monitor are pinned: render no CronJob at all (extend the existing `tracebloc.imageRefreshEnabled` helper).
  • Unit tests pin: `set env` rather than `rollout restart` for ingestor; GHCR endpoints; annotation key.
  • Dev-cluster smoke test before release.

Notes

  • RBAC stays unchanged — `kubectl set env` is a patch on the deployment, which we already have.
  • Steady-state rate-limit usage adds 1 HEAD per tick to GHCR (well under any anonymous limits).
  • If the change frequency is also high for chart-template changes to the ingestor subchart itself, that's a separate follow-up — this ticket only addresses the image-bump case.

Follow-up to #154.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions