fix(#130): default idempotency key to install-time stamp, not release revision by saadqbal · Pull Request #132 · tracebloc/client

saadqbal · 2026-05-19T13:34:01Z

Closes #130.

Summary

ingestor/templates/_helpers.tpl: when .Values.idempotencyKey is unset, derive the key from <release>-<unix-epoch> instead of <release>-<revision>. Helm resets revisions to 1 after helm uninstall, so the old default collided on the very common "uninstall → fix something → reinstall" loop and tripped jobs-manager's 409 guard.
ingestor/values.yaml: doc-comment flipped to describe the new default and call out why a revision-derived default is the wrong shape here. Override path is reframed as opt-in for at-most-once semantics.
ingestor/README.md: frequently-overridden table row updated to match.
ingestor/Chart.yaml: bumped 0.1.0 → 0.1.1.

The opt-in path (--set idempotencyKey=...) is unchanged — the value is still passed through verbatim, so callers who want stable replays still get them.

Why `%s-%s`, not `%s-%d`

Sprig's unixEpoch returns a string, not an int. The first cut of this patch used %d and rendered my-dataset-%!d(string=1779196370) — caught by the local helm template smoke test before commit.

Test plan

Template-level (run locally, passes):

helm lint ./ingestor --set-file ingestConfig=... → clean.
Two consecutive helm template invocations (default path) produce different idempotency_key values. Verified: my-dataset-1779196406 then my-dataset-1779196408 two seconds later.
Render with a different release name picks up the new name: other-name-1779196408.
Override path is verbatim: --set idempotencyKey=stable-uuid-abc → "idempotency_key":"stable-uuid-abc".

Cluster-level (owner of jobs-manager runtime needs to validate against a live cluster — I can't reach one from this branch):

helm install my-dataset ./ingestor --set-file ingestConfig=./test.yaml → uninstall → reinstall with the same release name + same ingest.yaml → second install creates a fresh ingestor Job (different ingest-job-<hash>), not a 409.
helm install my-dataset ./ingestor --set-file ingestConfig=./test.yaml --set idempotencyKey=foo → uninstall → reinstall with the same key + same config → jobs-manager returns 200 replay, no second Job created.

Refs

Sibling to #125 / #126 and #127 / #128, which together with this PR finish polishing the customer-facing ingestor install loop.
The 409 itself is enforced by jobs-manager in client-runtime — that behavior is correct and unchanged; this PR only changes what the chart hands to it.

🤖 Generated with Claude Code

Note

Low Risk
Low risk Helm-template change that only affects the default idempotencyKey generation; primary risk is behavior change for users who relied on revision-based dedupe semantics implicitly.

Overview
Updates the ingestor Helm chart so the default idempotencyKey is derived from <release>-<unix-epoch> (via now | unixEpoch) instead of <release>-<revision>, preventing collisions when a release is uninstalled and reinstalled under the same name.

Documentation is aligned to the new default and the override semantics (stable UUID for at-most-once replay), and the chart version is bumped to 0.1.1.

^{Reviewed by Cursor Bugbot for commit 0b9ae1a. Bugbot is set up for automated code reviews on this repo. Configure here.}

… revision `ingestor.idempotencyKey` previously fell back to `<release>-<revision>` when `.Values.idempotencyKey` was unset. Helm restarts revisions at 1 after `helm uninstall`, so reinstalling under the same release name produced the same key. If anything dedupe-relevant changed in between (image digest is the dominant case during testing), jobs-manager correctly rejected the second submission with a 409 — but to a customer following the README it looked like the chart was broken. Default to `<release>-<unix-epoch>` instead. Each install gets a fresh key; the opt-in stable-UUID path remains for callers who actually want at-most-once semantics across reinstalls. Note on the printf format: Sprig's `unixEpoch` returns a string (not an int), so the formatter is `%s-%s`, not `%s-%d`. Bumps ingestor subchart 0.1.0 → 0.1.1 (default-behavior change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

LukasWodka · 2026-05-19T13:34:46Z

👋 Heads-up — Code review queue is at 20 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

client#131 — feat(Move ingestor ServiceAccount into parent client chart (shared resource) #129): parent client chart owns the shared ingestor ServiceAccount (1.3.4) · author: @saadqbal · no reviewer assigned
client-runtime#36 — Bump requests from 2.32.3 to 2.33.0 in /Node-deploy · author: @dependabot · no reviewer assigned
client-runtime#37 — Bump black from 25.1.0 to 26.3.1 in /Node-deploy · author: @dependabot · no reviewer assigned
client-runtime#38 — Bump requests from 2.32.3 to 2.33.0 · author: @dependabot · no reviewer assigned
client-runtime#39 — Bump black from 25.1.0 to 26.3.1 · author: @dependabot · no reviewer assigned
data-ingestors#96 — Add release-time smoke test for ingestor image · author: @saadqbal · no reviewer assigned
design-system#19 — fix: un-track coverage/ and node_modules/ from git · author: @LukasWodka · no reviewer assigned
docs#3 — Block bots from crawling Mintlify static assets · author: @LukasWodka · reviewer: @saadqbal
model-zoo#74 — Add 9 MLM model variants and remove warmstart · author: @shujaatTracebloc · reviewer: @divyasinghds
tracebloc-client#296 — chore(deps): bump ray from 2.44.1 to 2.55.0 in /use_cases · author: @dependabot · no reviewer assigned

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

* Merge pull request #88 from tracebloc/ci/add-wip-limit-caller ci: add WIP-limit-check caller workflow * feat(requests-proxy): register requests-proxy in Helm chart (#95) * feat(requests-proxy): register requests-proxy in Helm chart - Add requests-proxy Deployment and Service templates - Auto-generate requests-proxy-admin token on first install (preserved across upgrades via lookup; override with requestsProxyAdminToken) - Inject REQUESTS_PROXY_ADMIN_TOKEN into jobs-manager via the same secret - Add images.requestsProxy and resources.requestsProxy values Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Update order of setting request proxy admin token * Bugbot Fix YAML * Bugbot fix add validation for request proxy --------- Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Merge pull request #106 from tracebloc/docs/drop-stale-helm-charts-refs-105 docs: drop stale tracebloc-helm-charts references in INSTALL.md * ci: add FR-pass comment caller for multi-stage kanban flow * ci: add FR gate caller for staging/main promotions * chore: sync main → develop after misrouted docs PRs (#108) * docs: fix README Deploy section (Helm not docker), surface in-repo docs The Deploy section opened with `docker pull tracebloc/client:latest`, but this repo ships a Helm chart — the actual install is `helm install`. External walkthrough URLs (`/local-linux`, `/local-macos`, `/aws`, `/deployment-overview`) didn't match any path in the tracebloc/docs tree, so they 404. The in-repo documentation (`docs/INSTALL.md`, `docs/MIGRATIONS.md`, `docs/migration-tools/README.md`, `client/MIGRATION.md`) was never linked from the README despite being the operational source of truth. Surgical change — the rest of the README stays as-is: - Replace `docker pull` with `helm repo add` + `helm install` (matches docs/INSTALL.md) - Call out chart version (v1.3.1) and platform support (AKS / EKS / bare-metal / OpenShift) up front - Table linking every in-repo operational doc - Fix external URLs to match actual tracebloc/docs paths (local-deployment-guide-linux, local-deployment-guide-macos, eks-client-deployment-guide, azure-deployment-guide) - Pull NetworkPolicy/CNI prerequisite into a callout Closes #101 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: surface standalone installer in README and INSTALL.md The standalone installer (bash <(curl -fsSL tracebloc.io/i.sh) / irm tracebloc.io/i.ps1 | iex) is the one-command path for evaluation, local dev, and first-time installs — it provisions a cluster, detects GPU drivers, and deploys the client. Today it isn't documented anywhere reachable from this repo, so readers see the multi-step helm install flow as the only option. README: - New "Quick install" subsection at the top of Deploy with macOS/Linux and Windows commands, brief description of what it does, and a pointer to the local helper scripts under scripts/ - Existing helm flow relabeled as "Helm install (production)" — now positioned as the option for existing production clusters docs/INSTALL.md: - Top-of-doc callout pointing at the standalone installer for non-production users - Production-focused content untouched Closes #103 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: reframe Quick install — same client, different cluster path Previous wording ("Best for evaluation, local dev, and first-time installs" / "Just trying it out? For local dev or a quick evaluation") implied the standalone installer produces a lesser/demo client. It doesn't — it produces the same full client, just on a cluster the script provisions for you. Reframes the differentiator around cluster ownership instead of install quality: - README: "Use this when you don't already have a cluster — the result is a full client install, not a demo." Helm subsection retitled from "Helm install (production)" to just "Helm install" with "For existing Kubernetes clusters". - INSTALL.md: callout opens with "Don't have a Kubernetes cluster yet?" and emphasizes "a full tracebloc client". Refs #103 * docs: explicit https:// on installer URLs (security) curl and PowerShell's irm both default to HTTP when no scheme is specified, so `curl -fsSL tracebloc.io/i.sh` and `irm tracebloc.io/i.ps1` issue plaintext requests. The downloaded body is piped straight into bash / iex, so a network-level attacker between the user and tracebloc.io could MITM the response and inject arbitrary code. Add explicit `https://` to every installer URL in README.md and docs/INSTALL.md so the request is encrypted from the first byte. Refs #103 * ci: bootstrap FR-pass caller on main * ci: bootstrap FR gate caller on main --------- Co-authored-by: Lukas Wuttke <lukas@tracebloc.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: lukasWuttke <54042461+LukasWodka@users.noreply.github.com> * chore(auto-upgrade): run cronjob hourly at :23 (#112) Switches the auto-upgrade CronJob default schedule from "23 2 * * *" (daily 02:23 UTC) to "23 * * * *" (hourly at :23). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Merge pull request #115 from tracebloc/chore/bump-chart-1.3.2-develop chore(client): bump chart 1.3.1 -> 1.3.2 (develop sync) * ci: drop push-tags trigger from release-helm-chart workflow (#117) * ci: drop push-tags trigger from release-helm-chart workflow `gh release create v<x.y.z>` (the established release path per `gh release list`) fires both `push` (tag) and `release` (published) events, which causes two parallel workflow runs to race for the gh-pages push. The slower run fails with non-fast-forward. Most recent example: v1.3.2 cut today — run 25492826437 (release event) failed; run 25492826350 (push event) succeeded. Artifacts landed fine, but the failed sibling shows up as a red X on the release and is noise for anyone debugging future releases. Keeping only `release: published` removes the race. The `Upload chart to GitHub Release (on tag)` step's `startsWith(github.ref, 'refs/tags/')` guard still evaluates true for release events (`github.ref` is the tag ref), so the upload step behaviour is preserved. Closes #116 * ci: harden release-asset upload against actions/runner#2788 With the push-tags trigger removed, the upload step's `if: startsWith(github.ref, 'refs/tags/')` guard is the only thing keeping the upload from running, but it silently evaluates to false when `github.ref` arrives empty — a known intermittent runner bug (actions/runner#2788, still open as of 2026-05). The same bug also affects `github.ref_name`, which softprops/action-gh-release@v2 uses by default to derive the tag, so the action itself can target the wrong release (or fail) when the bug fires. Drop the now-redundant `if:` guard (the workflow only runs on `release: published`, so every run is by definition a release event) and pass `tag_name` explicitly from the release event payload, which is unaffected by the bug. * ci: pin checkout ref to release tag (actions/runner#2788 hardening) actions/checkout@v4 defaults `ref` to github.ref, which is the same field hit by actions/runner#2788 — the still-open intermittent bug where github.ref arrives empty on release-triggered runs. Per the action's docs, when "checking out the repository that triggered a workflow, this defaults to the reference or SHA for that event. Otherwise, uses the default branch." So an empty github.ref would fall back to the repo default branch (develop here), and we'd package the chart from develop's HEAD instead of the tagged commit. Pin ref explicitly to github.event.release.tag_name, which is part of the release event payload and is unaffected by the runner bug. * Add MySQL Host to request proxy yaml file (#118) Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local> * Add request proxy url to jobs manager yaml file (#119) Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local> * Remove REQUESTS_PROXY_ADMIN_TOKEN (#120) Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local> * Reduce dependency on values.yaml file for requests proxy (#122) Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local> * feat(#86): ingestor Helm subchart + companion RBAC/service/authz for new ingestion endpoint (#123) * feat: companion chart changes for ingestion endpoint (client-runtime#21) Wires the cluster side of the new ingestion flow into the main client chart so the upcoming ingestor subchart can actually reach jobs-manager. Five small changes: 1. **rbac.yaml** — adds three permissions to jobs-manager's RBAC: - authentication.k8s.io/tokenreviews create - configmaps create - secrets create The endpoint validates caller SA tokens via TokenReview and creates a per-run ConfigMap (ingest.yaml) + Secret (BACKEND_TOKEN) before spawning the ingestor Job. `tokenreviews` is cluster-scoped and only added to the ClusterRole branch; customers with `clusterScope: false` won't have the ingestion endpoint authenticate. Documented in the rule comments. 2. **jobs-manager-service.yaml** (new) — ClusterIP exposing port 8080 at the stable name `jobs-manager`, so the ingestor subchart's post-install hook doesn't need to discover Pod IPs. 3. **jobs-manager-deployment.yaml** — adds containerPort 8080 on the `api` container, mounts the ingestion-authz ConfigMap at `/etc/tracebloc/ingestion-authz.yaml`, declares the corresponding pod-level volume. 4. **ingestion-authz-configmap.yaml** (new) — renders the `ingestionAuthz.allowed` policy customers configure in values.yaml. Mounted into jobs-manager and read at startup by `submit_ingestion_run.load_authz_policy`. Each entry maps (namespace, service_account) → allowed table_prefixes; omitted `namespace` defaults to .Release.Namespace. 5. **values.yaml** — adds the `ingestionAuthz.allowed` default that permits the ingestor subchart's default SA (named `ingestor`) to ingest into any table. Customers tighten via overrides. Verified ──────── - helm lint passes (only pre-existing icon-recommended INFO). - helm template renders all five resources cleanly with expected values (Service name, RBAC verbs, container port, volume mount). - helm unittest: 116/116 tests pass (existing snapshots unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(#86): ingestor Helm subchart (post-install hook submits to jobs-manager) The customer-facing chart that finally closes the end-to-end loop: helm install my-dataset tracebloc/ingestor --namespace tracebloc \ --set-file ingestConfig=./my-ingest.yaml \ --set image.digest=sha256:<digest> Renders the customer's ingest.yaml into a ConfigMap, then a post-install hook Job POSTs `{ingest_config, idempotency_key, image_digest}` to jobs-manager's `/internal/submit-ingestion-run` endpoint (client-runtime#21). jobs-manager validates the SA token via TokenReview, validates the YAML against ingest.v1, mints a backend token, creates the per-run ConfigMap + Secret + Job, returns 201 (or 200 on replay). Layout ────── ingestor/ ├── Chart.yaml appVersion: 0.3.0-rc1 (the data-ingestors release) ├── values.yaml ingestConfig (required, --set-file), image.digest │ (required, sha256), jobsManager.endpoint, │ serviceAccount.create, hook resources, idempotency ├── README.md ownership boundaries + verification commands ├── .helmignore └── templates/ ├── _helpers.tpl ├── serviceaccount.yaml default name "ingestor" ├── configmap-ingest-config.yaml hook-weight 0 └── post-install-job.yaml hook-weight 1, runs as the SA, reads its own token, POSTs. Ownership boundary ────────────────── Per #86's acceptance criteria, the README spells out what `helm uninstall` does and doesn't clean up: This chart owns: ConfigMap (ingest.yaml), the hook Job, the SA. jobs-manager owns: the per-run ConfigMap, Secret, ingestor Job. The cluster owns: the ingested data + metadata POSTed to the backend. `helm uninstall my-dataset` removes only the chart's footprint. The running ingestor Job and its data persist. This is deliberate — uninstall is not a cancel button. The README documents the kubectl command to cancel a run if needed. Implementation choices ────────────────────── - **post-install hook, not a long-lived resource.** The hook is the whole point of this chart — fire once, exit. - **automountServiceAccountToken: true** for the hook Job. That's the whole authentication mechanism — TokenReview on the SA token. Every other tracebloc workload disables automount; this one needs it. - **`hook-delete-policy: before-hook-creation`**, NOT `hook-succeeded`. Keeps the completed Job around so operators can `kubectl logs` the POST response after install. Cleaned up only on the next install under the same release. - **curlimages/curl** as the hook image — small, official, and ships python3 which we use to JSON-encode the multi-line YAML body safely (jq has a JSON-escape edge case for YAML newlines that's easier to side-step than handle). - **idempotencyKey defaults to `<release>-<revision>`** so a `helm upgrade` submits a fresh run. Customers override to a stable UUID if they want strict at-most-once across reinstalls. Verified ──────── - helm lint passes. - helm template renders all four resources (ConfigMap, Job, SA, and the inline templates expand cleanly with --set-file ingestConfig). - Required-value gates fire correctly: missing image.digest fails template; missing ingestConfig fails template. Closes #86 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#86): pre-render JSON body in ConfigMap, drop python3 + shell JSON escape Three bugbot findings on the first ingestor-chart pass, all real: 1. HIGH — curlimages/curl runtime layer doesn't include python3 (only in the build stage; stripped in the final image). The hook's `python3 -c ...` JSON encoder would fail with "python3: not found" on every install. 2. HIGH — even if python3 were available, the shell syntax `python3 -c "..." VAR=value` puts the assignments AFTER the command, which makes them positional argv, not env. The `os.environ['INGEST_CONFIG']` lookup would raise KeyError. 3. MEDIUM — `nindent 4` after literal template-source indentation puts a leading blank line into the YAML block scalar, so the customer's ingest.yaml gets a "\n" prefix that block-scalar parsers tolerate but is wrong. Structural fix rather than tweaking the script: the three POST-body fields (ingest_config, idempotency_key, image_digest) are ALL known at helm-template time. Render the JSON body in the ConfigMap as `body.json` using Helm's `toJson` filter — which handles multi-line string escaping correctly — then the hook becomes a one-line `curl --data-binary @body.json`. No python3 needed, no shell-side JSON construction at all. Eliminates both HIGH bugs as a category, not just instance-by-instance. For bug 3: use the left-trim action delimiter (dash inside braces) before the `required ... | nindent 4` action so it eats the leading whitespace cleanly. Verified via `helm template` that the rendered `ingest.yaml` now starts cleanly with `apiVersion:`. Verified ──────── - helm lint passes on both client/ and ingestor/. - helm template renders the JSON body with correct escaping (multi-line YAML → "\n"-escaped scalar in JSON). - helm template renders ingest.yaml with no leading blank line. - helm unittest client/: 116/116 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#86): track ingestor/values.yaml (was silently .gitignored) bugbot caught a serious oversight: `ingestor/values.yaml` exists in the working tree but never made it into the repository. Every `git add ingestor/` silently dropped it because the repo's .gitignore at line 119 has `/*/values*.yaml` — an anti-leak pattern for operator values files — which matches `ingestor/values.yaml`. Without the file the chart is broken on `helm install`: every template references `.Values.hookImage.repository`, `.Values.jobsManager.endpoint`, etc., and Helm renders nil-pointer errors when the keys are absent. Two-line fix: - Add `!ingestor/values.yaml` to .gitignore (mirrors the existing `!client/values*.yaml` exception for the main chart). Documents *why* the exception exists, so a future cleanup pass doesn't re-introduce the bug. - Commit the actual values.yaml file with the defaults already referenced by the README and the templates. Local verification before pushing: helm template my-dataset ingestor/ --namespace tracebloc \ --set ingestConfig=... --set image.digest=sha256:... \ # renders ServiceAccount, ConfigMap, Job correctly. Lesson for future runs: `git add <dir>/` is *not* a verification that files were added — gitignore patterns can silently drop them. Should have verified with `git status` before commit; would have caught this before bugbot did. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: nil-guard ingestionAuthz access for --reuse-values upgrade path (#124) #123's ingestion-authz ConfigMap template did unguarded nested access: {{- range .Values.ingestionAuthz.allowed }} This crashes with "nil pointer evaluating interface {}.allowed" when `.Values.ingestionAuthz` is absent — which is exactly what `helm upgrade --reuse-values` produces against a pre-#123 release. The stored values from the previous deploy don't have the key, and `--reuse-values` doesn't pick up new chart defaults, so the upgrade fails before any of the new resources are created. A real user hit this immediately after #123 merged: Error: UPGRADE FAILED: template: client/templates/ ingestion-authz-configmap.yaml:20:21: executing "..." at <.Values.ingestionAuthz.allowed>: nil pointer evaluating interface {}.allowed Fix: collapse the missing-parent and missing-child cases to an empty list with `default dict` + `default list`. The rendered ConfigMap becomes `allowed:` (empty), which the authz policy parser treats as "no SAs authorized" — fail-safe, matches the intent of "operator hasn't configured this yet". The recommended `helm upgrade` recipe is still `--reset-then-reuse-values` (picks up new defaults including the non-empty `ingestionAuthz.allowed` default), but the template no longer requires that — it renders correctly under either path. Verified ──────── - helm template renders cleanly with default values (full policy), with `--set ingestionAuthz=null` (empty allowed list), and with `--set ingestionAuthz.allowed=null` (same). - helm unittest client/: 116/116 pass, no snapshot changes. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(#125): wire INGESTOR_IMAGE_DIGEST; drop digest requirement from ingestor subchart (#126) * feat(#125): wire INGESTOR_IMAGE_DIGEST; drop digest requirement from ingestor subchart Companion to tracebloc/client-runtime#41 (which made the endpoint treat the request body's `image_digest` as an optional override of a cluster-configured default). With this PR the ingestor image fits the same auto-update model as every other component in the chart: client/values.yaml + images.ingestor.digest: "" The auto-upgrade cronjob bumps this when a new chart version is published; jobs-manager re-rolls and the new env takes effect. client/templates/jobs-manager-deployment.yaml + INGESTOR_IMAGE_DIGEST env, nil-guarded for --reuse-values from a pre-this-PR release. Empty value renders cleanly (no nil pointer), endpoint then accepts only request-body overrides until the operator sets the chart value. ingestor/values.yaml + templates/configmap-ingest-config.yaml + image.digest is now an OPTIONAL override, not required. + body.json renders without `image_digest` when none is set; the key is included only when the customer explicitly pinned via --set image.digest=... (the override path: reproducing old runs, testing pre-rollout versions, air-gapped mirrors). ingestor/README.md + Removes image.digest from "Required values". + Adds "Pinning a specific image version" section explaining the override use cases and when to reach for them. + Top-of-README install snippet drops --set image.digest=... — the dominant path is now `helm install --set-file ingestConfig=...`. Once both PRs land, the bootstrap step is a one-line bump of client/values.yaml's images.ingestor.digest to the current ghcr.io/tracebloc/ingestor release digest, plus a chart version bump so the auto-upgrade cronjob promotes it. Future ingestor releases follow the same pattern — bump digest + chart version, customers' auto-upgrade picks it up on the next tick. Verified ──────── - helm lint passes on both charts. - helm template renders: - env populated when images.ingestor.digest is set - env empty (nil-guard) when images.ingestor key absent entirely (simulates --reuse-values from pre-this-PR release) - body.json without image_digest when no override - body.json with image_digest when explicit --set image.digest=... - helm unittest client/: 116/116 pass. Closes #125 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bootstrap ingestor digest + bump chart version 1.3.2 → 1.3.3 Activates the auto-update model introduced by the rest of this PR. Without the value set, jobs-manager runs with `INGESTOR_IMAGE_DIGEST=""` and the ingestion endpoint returns 503 for every call that doesn't include a body override — which is the *opposite* of the "customer doesn't have to think about digests" UX this PR is supposed to enable. Two coupled bumps: client/Chart.yaml version: 1.3.2 → 1.3.3 appVersion: 1.3.2 → 1.3.3 Required for the auto-upgrade cronjob to detect this release. `helm search repo` orders by version; without a bump customers stay on 1.3.2 and never see the new env wiring. client/values.yaml images.ingestor.digest = "sha256:e6639b084d0d377072dc908db376050914ebd49c730ddaa13f838d10f5482ea9" The data-ingestors v0.3.0-rc1 release. Future ingestor releases bump both this and Chart.yaml's version; eventually a workflow in tracebloc/data-ingestors can raise the PR automatically when a new image is published. After this lands and the chart is published to gh-pages, a `helm upgrade --reset-then-reuse-values` on the customer's cluster (or the daily auto-upgrade cronjob's next tick) rolls jobs-manager with the env populated, and `helm install tracebloc/ingestor --set-file ingestConfig=...` — no `--set image.digest=...` — works. Verified ──────── - helm lint client/ clean. - helm template shows INGESTOR_IMAGE_DIGEST env populated with the real digest. - helm unittest client/: 116/116 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#127): ingestor chart auto-resolves jobs-manager endpoint to release namespace (#128) The ingestor subchart's default jobsManager.endpoint hardcoded "tracebloc" as the parent release's namespace: http://jobs-manager.tracebloc.svc.cluster.local:8080 Any release in a non-"tracebloc" namespace failed the post-install hook with `curl: (6) Could not resolve host: …`, blocking end-to-end ingestion. Surfaced today during real-cluster validation on a release deployed to `tracebloc-templates`. Fix shape: leave the values.yaml default empty; have the post-install hook template the endpoint to use `.Release.Namespace` when no value is set. The override path (cross-namespace install) keeps working — set `jobsManager.endpoint` explicitly and it wins over the default. values.yaml jobsManager.endpoint: "" (was hardcoded to tracebloc namespace) + comment explaining the auto-resolve + override semantics templates/post-install-job.yaml JOBS_MANAGER_ENDPOINT defaults to http://jobs-manager.<.Release.Namespace>.svc.cluster.local:8080 when .Values.jobsManager.endpoint is empty. README.md Frequently-overridden-values entry corrected. Verified ──────── - helm template into namespace `tracebloc-templates` → http://jobs-manager.tracebloc-templates.svc.cluster.local:8080 - helm template into namespace `some-other-ns` → http://jobs-manager.some-other-ns.svc.cluster.local:8080 - helm template with --set jobsManager.endpoint=http://port-forward.localhost:8888 → wins over the default. - helm lint clean. Closes #127 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(#129): parent client chart owns the shared ingestor ServiceAccount (#131) The ingestor ServiceAccount is shared by every `tracebloc/ingestor` subchart release in a namespace, but it was owned by the first such release. Concurrent installs of a second ingestor release collided with Helm's "cannot import into current release"; uninstalling the first release ripped the SA out from under all the others. Move the SA into this parent chart, which already owns the matching `ingestionAuthz` ConfigMap, so the SA + policy have the same lifecycle and every ingestor release in the namespace shares the SA cleanly. Plumb the name through `ingestionAuthz.serviceAccountName` as a single source of truth — both the new SA template and the default `allowed` entry in the authz ConfigMap dereference it via the new `tracebloc.ingestorServiceAccountName` helper. The helper nil-guards pre-#129 `--reuse-values` upgrades by defaulting to "ingestor". Document the SA adoption path in `client/MIGRATION.md` for clusters that already have an `ingestor` SA owned by a 0.1.0 subchart release — re-annotate before upgrading the parent chart so Helm doesn't refuse the import. Bumps chart to 1.3.4. Pair with tracebloc/ingestor 0.2.0, which flips `serviceAccount.create` default to `false` so subchart releases stop trying to own the SA themselves. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#130): default idempotency key to install-time stamp, not release revision (#132) `ingestor.idempotencyKey` previously fell back to `<release>-<revision>` when `.Values.idempotencyKey` was unset. Helm restarts revisions at 1 after `helm uninstall`, so reinstalling under the same release name produced the same key. If anything dedupe-relevant changed in between (image digest is the dominant case during testing), jobs-manager correctly rejected the second submission with a 409 — but to a customer following the README it looked like the chart was broken. Default to `<release>-<unix-epoch>` instead. Each install gets a fresh key; the opt-in stable-UUID path remains for callers who actually want at-most-once semantics across reinstalls. Note on the printf format: Sprig's `unixEpoch` returns a string (not an int), so the formatter is `%s-%s`, not `%s-%d`. Bumps ingestor subchart 0.1.0 → 0.1.1 (default-behavior change). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(#129)!: default serviceAccount.create=false; parent chart owns the SA (#133) The ingestor SA is shared across every `tracebloc/ingestor` release in the namespace. The previous per-release ownership made the second concurrent install collide with Helm's "cannot import into current release" error, and uninstalling the first release deleted the SA out from under any sibling release that worked around the collision with `serviceAccount.create=false`. The parent `tracebloc/client` chart 1.3.4 now owns the SA, exposing its name via `ingestionAuthz.serviceAccountName`. This subchart's default flips to `create: false` so it consumes that shared SA. The `name` value is still required so the post-install hook Job knows which SA's token to mount. `serviceAccount.create=true` remains available as an escape hatch for operators on a pre-1.3.4 parent chart, with a comment in values.yaml explaining when (and only when) to flip it back on. Breaking change: bumps chart to 0.2.0. Pair with the 1.3.4 parent chart bump; see the parent's MIGRATION.md "Upgrading to 1.3.4" section for the SA-adoption procedure on clusters where a 0.1.0 release already created the SA. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(chart): bump ingestor digest to v0.3.0 + chart to 1.3.5 (#134) v0.3.0 is the first production-ready ingestor release (signed + SBOM), validated end-to-end against EKS on 2026-05-19 (6 files in PVC + 576 MySQL rows via the declarative chart path). The previous default (v0.3.0-rc1) had three real-cluster bugs that landed as tracebloc/data-ingestors#106: - #103 wheel + sdist were missing schema/ingest.v1.json - #104 image-resolution validator tuple-vs-list comparison - #105 _has_extension dot/case normalization (no more cat1.jpeg.jpeg) Chart bumped to 1.3.5 so the auto-upgrade cronjob (#69) detects the change and rolls customers onto v0.3.0 on the next tick. ingestor image: ghcr.io/tracebloc/ingestor@sha256:463e2367...07a4a cosign verify available; release notes contain the verification command. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#135): publish ingestor subchart alongside parent chart (#136) The customer-facing install path is helm repo add tracebloc https://tracebloc.github.io/client helm install my-dataset tracebloc/ingestor \ --namespace tracebloc-templates \ --set-file ingestConfig=./my.yaml For `tracebloc/ingestor` to resolve from that helm repo, the ingestor subchart must be packaged into gh-pages alongside the parent client chart. Before this PR, `release-helm-chart.yaml` only ran `helm package ./client`, so the second install path returned `Error: chart "ingestor" not found`. helm-ci.yaml also only lints the parent chart, so any future regression in `ingestor/templates/` would land on develop without CI noticing. Three changes: 1. release-helm-chart.yaml: package + index BOTH client and ingestor into a single shared index.yaml. Attach both tgzs to the GitHub release for download-by-tag pinning. 2. helm-ci.yaml: lint the ingestor subchart on every PR alongside the per-platform client lints. Plain `helm lint --strict ./ingestor` is enough — its only required value (ingestConfig) emits INFO not FAIL, and the chart's templates don't branch on platform so the per-platform values-file matrix doesn't apply. 3. ingestor/Chart.yaml: bump appVersion 0.3.0-rc1 → 0.3.0 to match the tracebloc/data-ingestors v0.3.0 release that just shipped. Chart version (0.2.0) is unchanged; appVersion is descriptive. Validated locally: both charts package cleanly (client-1.3.5.tgz, ingestor-0.2.0.tgz), all four platform-specific client lints pass, ingestor lint passes. Closes #135. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(ingestor): explain image vs chart update lifecycle (#138) Customers ask: "the cluster has an auto-upgrade cronjob — does that mean my ingestor chart updates too?" The answer is nuanced: the image auto-updates (via INGESTOR_IMAGE_DIGEST on jobs-manager, kept current by the cronjob), but the chart on your workstation is independent — Helm's repo cache doesn't refresh itself. Add a "How updates work" section that explains the two-layer model and the strong property that the image you run is decoupled from the chart version that submitted the request. Plus an explicit FAQ on previously-installed ingestor releases (nothing to upgrade — fire-and-forget). No code change. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix three bugbot findings from PR #137 review (#142) * fix(#139): preserve idempotency key across helm upgrade The ingestor.idempotencyKey helper defaulted to "<release>-<unix-epoch>" and re-stamped on every render. `helm upgrade --reuse-values` preserves the stored value "" (not the previously-rendered key), so the template re-evaluated `now | unixEpoch` and produced a NEW key each upgrade — accidentally creating duplicate ingestion runs from what customers expected to be no-op upgrades. Contradicts the documented behavior in ingestor/README.md added in #138. Look up the existing post-install hook ConfigMap from the previous render and reuse its idempotency_key. On fresh install (or after uninstall) the lookup returns empty and we fall through to the now-based default. `helm template` (no cluster connection) returns empty for lookup too, so local previews still get a fresh key per render — matches the in-cluster install path the first time. Caught by bugbot on PR #137 review. Closes #139. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#140): read requests-proxy resources from values The requests-proxy deployment hardcoded its container resources, ignoring the resources.requestsProxy schema entry that values.schema.json has defined since the requests-proxy was added. Every other component (jobsManager, podsMonitor, mysql) reads from .Values.resources.<name>.* with defaults — bring requestsProxy in line with that pattern. Adds the resources.requestsProxy block to values.yaml with the existing hardcoded defaults so behavior on a fresh install is unchanged. The template uses the default-through-dict nil-guard idiom so `helm upgrade --reuse-values` from a pre-1.3.6 release (where the value didn't exist) still renders cleanly without crashing on a nil parent. Caught by bugbot on PR #137 review. Closes #140. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#141): add images.ingestor entry to values.schema.json values.yaml has had images.ingestor.digest since #126, and the jobs-manager template surfaces it as INGESTOR_IMAGE_DIGEST, but the schema didn't validate it — every other image (jobsManager, podsMonitor, resourceMonitor, requestsProxy, mysqlClient, busybox) has an entry. An operator setting --set images.ingestor.digest=foo (not the canonical sha256:<64-hex>) bypassed schema validation and failed only later inside submit_ingestion_run.py. Add the missing entry mirroring the other image entries' shape. helm template now rejects malformed digests at chart-template time ("values don't meet the specifications of the schema(s)... Does not match pattern '^(sha256:[a-f0-9]{64})?$'") rather than waiting for runtime. Caught by bugbot on PR #137 review. Closes #141. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: lukasWuttke <54042461+LukasWodka@users.noreply.github.com> Co-authored-by: Syed Is Saqlain <saqlain.syed007@gmail.com> Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Lukas Wuttke <lukas@tracebloc.io>

saadqbal self-assigned this May 19, 2026

This was referenced May 19, 2026

feat(#129)!: ingestor subchart default serviceAccount.create=false (0.2.0) #133

Merged

refactor(#97): make Config lazy and drop laptop-path defaults tracebloc/data-ingestors#98

Merged

shujaatTracebloc approved these changes May 19, 2026

View reviewed changes

aptracebloc approved these changes May 19, 2026

View reviewed changes

saadqbal merged commit 9d556bb into develop May 19, 2026
4 checks passed

This was referenced May 19, 2026

fix(ingestor): default idempotency key to install-time stamp, not release revision #130

Closed

Sync develop → main for v1.3.5 chart release #137

Merged

saadqbal deleted the fix/ingestor-idempotency-default-130 branch May 25, 2026 07:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#130): default idempotency key to install-time stamp, not release revision#132

fix(#130): default idempotency key to install-time stamp, not release revision#132
saadqbal merged 1 commit into
developfrom
fix/ingestor-idempotency-default-130

saadqbal commented May 19, 2026 •

edited by cursor Bot

Loading

Uh oh!

LukasWodka commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

saadqbal commented May 19, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why %s-%s, not %s-%d

Test plan

Refs

Uh oh!

LukasWodka commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

saadqbal commented May 19, 2026 •

edited by cursor Bot

Loading

Why `%s-%s`, not `%s-%d`