Skip to content

ci: add WIP-limit-check caller workflow#88

Merged
saadqbal merged 1 commit into
developfrom
ci/add-wip-limit-caller
Apr 30, 2026
Merged

ci: add WIP-limit-check caller workflow#88
saadqbal merged 1 commit into
developfrom
ci/add-wip-limit-caller

Conversation

@LukasWodka
Copy link
Copy Markdown
Contributor

Summary

Adds a 9-line caller workflow that triggers the org-wide reusable workflow wip-limit-check.yml on PR open / ready_for_review.

Behavior

When a PR opens:

  • Reads the engineer kanban's Code review column
  • If count > 8 (the WIP limit), posts a comment on the new PR with the list of open PRs already in queue
  • Does not block the PR — informational nudge only

Why

Surfaces team kanban discipline at decision time: pull from review before opening new work. Same shape as the other tracebloc/.github reusable workflows.

🤖 Generated with Claude Code

@saadqbal saadqbal merged commit c9a5e8e into develop Apr 30, 2026
2 checks passed
saadqbal added a commit that referenced this pull request May 20, 2026
* Merge pull request #88 from tracebloc/ci/add-wip-limit-caller

ci: add WIP-limit-check caller workflow

* feat(requests-proxy): register requests-proxy in Helm chart (#95)

* feat(requests-proxy): register requests-proxy in Helm chart

- Add requests-proxy Deployment and Service templates
- Auto-generate requests-proxy-admin token on first install (preserved
  across upgrades via lookup; override with requestsProxyAdminToken)
- Inject REQUESTS_PROXY_ADMIN_TOKEN into jobs-manager via the same secret
- Add images.requestsProxy and resources.requestsProxy values

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Update order of setting request proxy admin token

* Bugbot Fix YAML

* Bugbot fix add validation for request proxy

---------

Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Merge pull request #106 from tracebloc/docs/drop-stale-helm-charts-refs-105

docs: drop stale tracebloc-helm-charts references in INSTALL.md

* ci: add FR-pass comment caller for multi-stage kanban flow

* ci: add FR gate caller for staging/main promotions

* chore: sync main → develop after misrouted docs PRs (#108)

* docs: fix README Deploy section (Helm not docker), surface in-repo docs

The Deploy section opened with `docker pull tracebloc/client:latest`,
but this repo ships a Helm chart — the actual install is `helm install`.
External walkthrough URLs (`/local-linux`, `/local-macos`, `/aws`,
`/deployment-overview`) didn't match any path in the tracebloc/docs
tree, so they 404. The in-repo documentation (`docs/INSTALL.md`,
`docs/MIGRATIONS.md`, `docs/migration-tools/README.md`,
`client/MIGRATION.md`) was never linked from the README despite being
the operational source of truth.

Surgical change — the rest of the README stays as-is:
- Replace `docker pull` with `helm repo add` + `helm install` (matches
  docs/INSTALL.md)
- Call out chart version (v1.3.1) and platform support (AKS / EKS /
  bare-metal / OpenShift) up front
- Table linking every in-repo operational doc
- Fix external URLs to match actual tracebloc/docs paths
  (local-deployment-guide-linux, local-deployment-guide-macos,
  eks-client-deployment-guide, azure-deployment-guide)
- Pull NetworkPolicy/CNI prerequisite into a callout

Closes #101

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: surface standalone installer in README and INSTALL.md

The standalone installer (bash <(curl -fsSL tracebloc.io/i.sh) /
irm tracebloc.io/i.ps1 | iex) is the one-command path for evaluation,
local dev, and first-time installs — it provisions a cluster, detects
GPU drivers, and deploys the client. Today it isn't documented anywhere
reachable from this repo, so readers see the multi-step helm install
flow as the only option.

README:
- New "Quick install" subsection at the top of Deploy with macOS/Linux
  and Windows commands, brief description of what it does, and a
  pointer to the local helper scripts under scripts/
- Existing helm flow relabeled as "Helm install (production)" — now
  positioned as the option for existing production clusters

docs/INSTALL.md:
- Top-of-doc callout pointing at the standalone installer for
  non-production users
- Production-focused content untouched

Closes #103

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: reframe Quick install — same client, different cluster path

Previous wording ("Best for evaluation, local dev, and first-time
installs" / "Just trying it out? For local dev or a quick evaluation")
implied the standalone installer produces a lesser/demo client. It
doesn't — it produces the same full client, just on a cluster the
script provisions for you.

Reframes the differentiator around cluster ownership instead of install
quality:
- README: "Use this when you don't already have a cluster — the result
  is a full client install, not a demo." Helm subsection retitled
  from "Helm install (production)" to just "Helm install" with
  "For existing Kubernetes clusters".
- INSTALL.md: callout opens with "Don't have a Kubernetes cluster
  yet?" and emphasizes "a full tracebloc client".

Refs #103

* docs: explicit https:// on installer URLs (security)

curl and PowerShell's irm both default to HTTP when no scheme is
specified, so `curl -fsSL tracebloc.io/i.sh` and `irm tracebloc.io/i.ps1`
issue plaintext requests. The downloaded body is piped straight into
bash / iex, so a network-level attacker between the user and tracebloc.io
could MITM the response and inject arbitrary code.

Add explicit `https://` to every installer URL in README.md and
docs/INSTALL.md so the request is encrypted from the first byte.

Refs #103

* ci: bootstrap FR-pass caller on main

* ci: bootstrap FR gate caller on main

---------

Co-authored-by: Lukas Wuttke <lukas@tracebloc.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: lukasWuttke <54042461+LukasWodka@users.noreply.github.com>

* chore(auto-upgrade): run cronjob hourly at :23 (#112)

Switches the auto-upgrade CronJob default schedule from
"23 2 * * *" (daily 02:23 UTC) to "23 * * * *" (hourly at :23).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Merge pull request #115 from tracebloc/chore/bump-chart-1.3.2-develop

chore(client): bump chart 1.3.1 -> 1.3.2 (develop sync)

* ci: drop push-tags trigger from release-helm-chart workflow (#117)

* ci: drop push-tags trigger from release-helm-chart workflow

`gh release create v<x.y.z>` (the established release path per
`gh release list`) fires both `push` (tag) and `release` (published)
events, which causes two parallel workflow runs to race for the
gh-pages push. The slower run fails with non-fast-forward.

Most recent example: v1.3.2 cut today — run 25492826437 (release event)
failed; run 25492826350 (push event) succeeded. Artifacts landed fine,
but the failed sibling shows up as a red X on the release and is noise
for anyone debugging future releases.

Keeping only `release: published` removes the race. The
`Upload chart to GitHub Release (on tag)` step's
`startsWith(github.ref, 'refs/tags/')` guard still evaluates true for
release events (`github.ref` is the tag ref), so the upload step
behaviour is preserved.

Closes #116

* ci: harden release-asset upload against actions/runner#2788

With the push-tags trigger removed, the upload step's
`if: startsWith(github.ref, 'refs/tags/')` guard is the only thing
keeping the upload from running, but it silently evaluates to false
when `github.ref` arrives empty — a known intermittent runner bug
(actions/runner#2788, still open as of 2026-05). The same bug also
affects `github.ref_name`, which softprops/action-gh-release@v2 uses
by default to derive the tag, so the action itself can target the
wrong release (or fail) when the bug fires.

Drop the now-redundant `if:` guard (the workflow only runs on
`release: published`, so every run is by definition a release event)
and pass `tag_name` explicitly from the release event payload, which
is unaffected by the bug.

* ci: pin checkout ref to release tag (actions/runner#2788 hardening)

actions/checkout@v4 defaults `ref` to github.ref, which is the same
field hit by actions/runner#2788 — the still-open intermittent bug
where github.ref arrives empty on release-triggered runs. Per the
action's docs, when "checking out the repository that triggered a
workflow, this defaults to the reference or SHA for that event.
Otherwise, uses the default branch." So an empty github.ref would
fall back to the repo default branch (develop here), and we'd
package the chart from develop's HEAD instead of the tagged commit.

Pin ref explicitly to github.event.release.tag_name, which is part of
the release event payload and is unaffected by the runner bug.

* Add MySQL Host to request proxy yaml file (#118)

Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local>

* Add request proxy url to jobs manager yaml file (#119)

Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local>

* Remove REQUESTS_PROXY_ADMIN_TOKEN (#120)

Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local>

* Reduce dependency on values.yaml file for requests proxy (#122)

Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local>

* feat(#86): ingestor Helm subchart + companion RBAC/service/authz for new ingestion endpoint (#123)

* feat: companion chart changes for ingestion endpoint (client-runtime#21)

Wires the cluster side of the new ingestion flow into the main client
chart so the upcoming ingestor subchart can actually reach jobs-manager.

Five small changes:

1. **rbac.yaml** — adds three permissions to jobs-manager's RBAC:
     - authentication.k8s.io/tokenreviews   create
     - configmaps                            create
     - secrets                               create
   The endpoint validates caller SA tokens via TokenReview and creates
   a per-run ConfigMap (ingest.yaml) + Secret (BACKEND_TOKEN) before
   spawning the ingestor Job.

   `tokenreviews` is cluster-scoped and only added to the ClusterRole
   branch; customers with `clusterScope: false` won't have the
   ingestion endpoint authenticate. Documented in the rule comments.

2. **jobs-manager-service.yaml** (new) — ClusterIP exposing port 8080
   at the stable name `jobs-manager`, so the ingestor subchart's
   post-install hook doesn't need to discover Pod IPs.

3. **jobs-manager-deployment.yaml** — adds containerPort 8080 on the
   `api` container, mounts the ingestion-authz ConfigMap at
   `/etc/tracebloc/ingestion-authz.yaml`, declares the corresponding
   pod-level volume.

4. **ingestion-authz-configmap.yaml** (new) — renders the
   `ingestionAuthz.allowed` policy customers configure in values.yaml.
   Mounted into jobs-manager and read at startup by
   `submit_ingestion_run.load_authz_policy`. Each entry maps
   (namespace, service_account) → allowed table_prefixes; omitted
   `namespace` defaults to .Release.Namespace.

5. **values.yaml** — adds the `ingestionAuthz.allowed` default that
   permits the ingestor subchart's default SA (named `ingestor`) to
   ingest into any table. Customers tighten via overrides.

Verified
────────
- helm lint passes (only pre-existing icon-recommended INFO).
- helm template renders all five resources cleanly with expected
  values (Service name, RBAC verbs, container port, volume mount).
- helm unittest: 116/116 tests pass (existing snapshots unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(#86): ingestor Helm subchart (post-install hook submits to jobs-manager)

The customer-facing chart that finally closes the end-to-end loop:

  helm install my-dataset tracebloc/ingestor --namespace tracebloc \
    --set-file ingestConfig=./my-ingest.yaml \
    --set image.digest=sha256:<digest>

Renders the customer's ingest.yaml into a ConfigMap, then a
post-install hook Job POSTs `{ingest_config, idempotency_key,
image_digest}` to jobs-manager's `/internal/submit-ingestion-run`
endpoint (client-runtime#21). jobs-manager validates the SA token via
TokenReview, validates the YAML against ingest.v1, mints a backend
token, creates the per-run ConfigMap + Secret + Job, returns 201
(or 200 on replay).

Layout
──────
  ingestor/
  ├── Chart.yaml          appVersion: 0.3.0-rc1 (the data-ingestors release)
  ├── values.yaml         ingestConfig (required, --set-file), image.digest
  │                       (required, sha256), jobsManager.endpoint,
  │                       serviceAccount.create, hook resources, idempotency
  ├── README.md           ownership boundaries + verification commands
  ├── .helmignore
  └── templates/
      ├── _helpers.tpl
      ├── serviceaccount.yaml             default name "ingestor"
      ├── configmap-ingest-config.yaml    hook-weight 0
      └── post-install-job.yaml           hook-weight 1, runs as the SA,
                                          reads its own token, POSTs.

Ownership boundary
──────────────────
Per #86's acceptance criteria, the README spells out what `helm uninstall`
does and doesn't clean up:

  This chart owns:    ConfigMap (ingest.yaml), the hook Job, the SA.
  jobs-manager owns:  the per-run ConfigMap, Secret, ingestor Job.
  The cluster owns:   the ingested data + metadata POSTed to the backend.

`helm uninstall my-dataset` removes only the chart's footprint. The
running ingestor Job and its data persist. This is deliberate — uninstall
is not a cancel button. The README documents the kubectl command to
cancel a run if needed.

Implementation choices
──────────────────────
- **post-install hook, not a long-lived resource.** The hook is the
  whole point of this chart — fire once, exit.
- **automountServiceAccountToken: true** for the hook Job. That's the
  whole authentication mechanism — TokenReview on the SA token. Every
  other tracebloc workload disables automount; this one needs it.
- **`hook-delete-policy: before-hook-creation`**, NOT `hook-succeeded`.
  Keeps the completed Job around so operators can `kubectl logs` the
  POST response after install. Cleaned up only on the next install
  under the same release.
- **curlimages/curl** as the hook image — small, official, and ships
  python3 which we use to JSON-encode the multi-line YAML body safely
  (jq has a JSON-escape edge case for YAML newlines that's easier
  to side-step than handle).
- **idempotencyKey defaults to `<release>-<revision>`** so a
  `helm upgrade` submits a fresh run. Customers override to a stable
  UUID if they want strict at-most-once across reinstalls.

Verified
────────
- helm lint passes.
- helm template renders all four resources (ConfigMap, Job, SA, and
  the inline templates expand cleanly with --set-file ingestConfig).
- Required-value gates fire correctly: missing image.digest fails
  template; missing ingestConfig fails template.

Closes #86

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#86): pre-render JSON body in ConfigMap, drop python3 + shell JSON escape

Three bugbot findings on the first ingestor-chart pass, all real:

  1. HIGH — curlimages/curl runtime layer doesn't include python3
     (only in the build stage; stripped in the final image). The
     hook's `python3 -c ...` JSON encoder would fail with
     "python3: not found" on every install.

  2. HIGH — even if python3 were available, the shell syntax
     `python3 -c "..." VAR=value` puts the assignments AFTER the
     command, which makes them positional argv, not env. The
     `os.environ['INGEST_CONFIG']` lookup would raise KeyError.

  3. MEDIUM — `nindent 4` after literal template-source indentation
     puts a leading blank line into the YAML block scalar, so the
     customer's ingest.yaml gets a "\n" prefix that block-scalar
     parsers tolerate but is wrong.

Structural fix rather than tweaking the script: the three POST-body
fields (ingest_config, idempotency_key, image_digest) are ALL known
at helm-template time. Render the JSON body in the ConfigMap as
`body.json` using Helm's `toJson` filter — which handles multi-line
string escaping correctly — then the hook becomes a one-line
`curl --data-binary @body.json`. No python3 needed, no shell-side
JSON construction at all. Eliminates both HIGH bugs as a category,
not just instance-by-instance.

For bug 3: use the left-trim action delimiter (dash inside braces)
before the `required ... | nindent 4` action so it eats the
leading whitespace cleanly. Verified via `helm template` that the
rendered `ingest.yaml` now starts cleanly with `apiVersion:`.

Verified
────────
- helm lint passes on both client/ and ingestor/.
- helm template renders the JSON body with correct escaping
  (multi-line YAML → "\n"-escaped scalar in JSON).
- helm template renders ingest.yaml with no leading blank line.
- helm unittest client/: 116/116 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#86): track ingestor/values.yaml (was silently .gitignored)

bugbot caught a serious oversight: `ingestor/values.yaml` exists in
the working tree but never made it into the repository. Every `git
add ingestor/` silently dropped it because the repo's .gitignore at
line 119 has `/*/values*.yaml` — an anti-leak pattern for operator
values files — which matches `ingestor/values.yaml`.

Without the file the chart is broken on `helm install`: every template
references `.Values.hookImage.repository`, `.Values.jobsManager.endpoint`,
etc., and Helm renders nil-pointer errors when the keys are absent.

Two-line fix:

  - Add `!ingestor/values.yaml` to .gitignore (mirrors the existing
    `!client/values*.yaml` exception for the main chart). Documents
    *why* the exception exists, so a future cleanup pass doesn't
    re-introduce the bug.
  - Commit the actual values.yaml file with the defaults already
    referenced by the README and the templates.

Local verification before pushing:

  helm template my-dataset ingestor/ --namespace tracebloc \
    --set ingestConfig=... --set image.digest=sha256:... \
  # renders ServiceAccount, ConfigMap, Job correctly.

Lesson for future runs: `git add <dir>/` is *not* a verification that
files were added — gitignore patterns can silently drop them. Should
have verified with `git status` before commit; would have caught this
before bugbot did.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: nil-guard ingestionAuthz access for --reuse-values upgrade path (#124)

#123's ingestion-authz ConfigMap template did unguarded nested access:

    {{- range .Values.ingestionAuthz.allowed }}

This crashes with "nil pointer evaluating interface {}.allowed" when
`.Values.ingestionAuthz` is absent — which is exactly what `helm
upgrade --reuse-values` produces against a pre-#123 release. The
stored values from the previous deploy don't have the key, and
`--reuse-values` doesn't pick up new chart defaults, so the upgrade
fails before any of the new resources are created.

A real user hit this immediately after #123 merged:

    Error: UPGRADE FAILED: template: client/templates/
    ingestion-authz-configmap.yaml:20:21:
    executing "..." at <.Values.ingestionAuthz.allowed>:
    nil pointer evaluating interface {}.allowed

Fix: collapse the missing-parent and missing-child cases to an empty
list with `default dict` + `default list`. The rendered ConfigMap
becomes `allowed:` (empty), which the authz policy parser treats as
"no SAs authorized" — fail-safe, matches the intent of "operator
hasn't configured this yet".

The recommended `helm upgrade` recipe is still
`--reset-then-reuse-values` (picks up new defaults including the
non-empty `ingestionAuthz.allowed` default), but the template no
longer requires that — it renders correctly under either path.

Verified
────────
- helm template renders cleanly with default values
  (full policy), with `--set ingestionAuthz=null` (empty allowed
  list), and with `--set ingestionAuthz.allowed=null` (same).
- helm unittest client/: 116/116 pass, no snapshot changes.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(#125): wire INGESTOR_IMAGE_DIGEST; drop digest requirement from ingestor subchart (#126)

* feat(#125): wire INGESTOR_IMAGE_DIGEST; drop digest requirement from ingestor subchart

Companion to tracebloc/client-runtime#41 (which made the endpoint
treat the request body's `image_digest` as an optional override of a
cluster-configured default). With this PR the ingestor image fits the
same auto-update model as every other component in the chart:

  client/values.yaml
    + images.ingestor.digest: ""
      The auto-upgrade cronjob bumps this when a new chart version is
      published; jobs-manager re-rolls and the new env takes effect.

  client/templates/jobs-manager-deployment.yaml
    + INGESTOR_IMAGE_DIGEST env, nil-guarded for --reuse-values from
      a pre-this-PR release. Empty value renders cleanly (no nil
      pointer), endpoint then accepts only request-body overrides
      until the operator sets the chart value.

  ingestor/values.yaml + templates/configmap-ingest-config.yaml
    + image.digest is now an OPTIONAL override, not required.
    + body.json renders without `image_digest` when none is set; the
      key is included only when the customer explicitly pinned via
      --set image.digest=... (the override path: reproducing old runs,
      testing pre-rollout versions, air-gapped mirrors).

  ingestor/README.md
    + Removes image.digest from "Required values".
    + Adds "Pinning a specific image version" section explaining the
      override use cases and when to reach for them.
    + Top-of-README install snippet drops --set image.digest=... — the
      dominant path is now `helm install --set-file ingestConfig=...`.

Once both PRs land, the bootstrap step is a one-line bump of
client/values.yaml's images.ingestor.digest to the current
ghcr.io/tracebloc/ingestor release digest, plus a chart version bump
so the auto-upgrade cronjob promotes it. Future ingestor releases
follow the same pattern — bump digest + chart version, customers'
auto-upgrade picks it up on the next tick.

Verified
────────
- helm lint passes on both charts.
- helm template renders:
    - env populated when images.ingestor.digest is set
    - env empty (nil-guard) when images.ingestor key absent entirely
      (simulates --reuse-values from pre-this-PR release)
    - body.json without image_digest when no override
    - body.json with image_digest when explicit --set image.digest=...
- helm unittest client/: 116/116 pass.

Closes #125

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bootstrap ingestor digest + bump chart version 1.3.2 → 1.3.3

Activates the auto-update model introduced by the rest of this PR.
Without the value set, jobs-manager runs with `INGESTOR_IMAGE_DIGEST=""`
and the ingestion endpoint returns 503 for every call that doesn't
include a body override — which is the *opposite* of the "customer
doesn't have to think about digests" UX this PR is supposed to enable.

Two coupled bumps:

  client/Chart.yaml
    version: 1.3.2 → 1.3.3
    appVersion: 1.3.2 → 1.3.3
      Required for the auto-upgrade cronjob to detect this release.
      `helm search repo` orders by version; without a bump customers
      stay on 1.3.2 and never see the new env wiring.

  client/values.yaml
    images.ingestor.digest = "sha256:e6639b084d0d377072dc908db376050914ebd49c730ddaa13f838d10f5482ea9"
      The data-ingestors v0.3.0-rc1 release. Future ingestor releases
      bump both this and Chart.yaml's version; eventually a workflow
      in tracebloc/data-ingestors can raise the PR automatically when
      a new image is published.

After this lands and the chart is published to gh-pages, a
`helm upgrade --reset-then-reuse-values` on the customer's cluster
(or the daily auto-upgrade cronjob's next tick) rolls jobs-manager
with the env populated, and `helm install tracebloc/ingestor
--set-file ingestConfig=...` — no `--set image.digest=...` — works.

Verified
────────
- helm lint client/ clean.
- helm template shows INGESTOR_IMAGE_DIGEST env populated with the
  real digest.
- helm unittest client/: 116/116 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#127): ingestor chart auto-resolves jobs-manager endpoint to release namespace (#128)

The ingestor subchart's default jobsManager.endpoint hardcoded
"tracebloc" as the parent release's namespace:

    http://jobs-manager.tracebloc.svc.cluster.local:8080

Any release in a non-"tracebloc" namespace failed the post-install
hook with `curl: (6) Could not resolve host: …`, blocking end-to-end
ingestion. Surfaced today during real-cluster validation on a release
deployed to `tracebloc-templates`.

Fix shape: leave the values.yaml default empty; have the post-install
hook template the endpoint to use `.Release.Namespace` when no value
is set. The override path (cross-namespace install) keeps working —
set `jobsManager.endpoint` explicitly and it wins over the default.

  values.yaml
    jobsManager.endpoint: "" (was hardcoded to tracebloc namespace)
    + comment explaining the auto-resolve + override semantics

  templates/post-install-job.yaml
    JOBS_MANAGER_ENDPOINT defaults to
      http://jobs-manager.<.Release.Namespace>.svc.cluster.local:8080
    when .Values.jobsManager.endpoint is empty.

  README.md
    Frequently-overridden-values entry corrected.

Verified
────────
- helm template into namespace `tracebloc-templates` →
  http://jobs-manager.tracebloc-templates.svc.cluster.local:8080
- helm template into namespace `some-other-ns` →
  http://jobs-manager.some-other-ns.svc.cluster.local:8080
- helm template with --set jobsManager.endpoint=http://port-forward.localhost:8888
  → wins over the default.
- helm lint clean.

Closes #127

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(#129): parent client chart owns the shared ingestor ServiceAccount (#131)

The ingestor ServiceAccount is shared by every `tracebloc/ingestor`
subchart release in a namespace, but it was owned by the first such
release. Concurrent installs of a second ingestor release collided
with Helm's "cannot import into current release"; uninstalling the
first release ripped the SA out from under all the others.

Move the SA into this parent chart, which already owns the matching
`ingestionAuthz` ConfigMap, so the SA + policy have the same lifecycle
and every ingestor release in the namespace shares the SA cleanly.

Plumb the name through `ingestionAuthz.serviceAccountName` as a single
source of truth — both the new SA template and the default `allowed`
entry in the authz ConfigMap dereference it via the new
`tracebloc.ingestorServiceAccountName` helper. The helper nil-guards
pre-#129 `--reuse-values` upgrades by defaulting to "ingestor".

Document the SA adoption path in `client/MIGRATION.md` for clusters
that already have an `ingestor` SA owned by a 0.1.0 subchart release —
re-annotate before upgrading the parent chart so Helm doesn't refuse
the import.

Bumps chart to 1.3.4. Pair with tracebloc/ingestor 0.2.0, which flips
`serviceAccount.create` default to `false` so subchart releases stop
trying to own the SA themselves.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#130): default idempotency key to install-time stamp, not release revision (#132)

`ingestor.idempotencyKey` previously fell back to `<release>-<revision>`
when `.Values.idempotencyKey` was unset. Helm restarts revisions at 1
after `helm uninstall`, so reinstalling under the same release name
produced the same key. If anything dedupe-relevant changed in between
(image digest is the dominant case during testing), jobs-manager
correctly rejected the second submission with a 409 — but to a customer
following the README it looked like the chart was broken.

Default to `<release>-<unix-epoch>` instead. Each install gets a fresh
key; the opt-in stable-UUID path remains for callers who actually want
at-most-once semantics across reinstalls.

Note on the printf format: Sprig's `unixEpoch` returns a string (not an
int), so the formatter is `%s-%s`, not `%s-%d`.

Bumps ingestor subchart 0.1.0 → 0.1.1 (default-behavior change).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(#129)!: default serviceAccount.create=false; parent chart owns the SA (#133)

The ingestor SA is shared across every `tracebloc/ingestor` release in
the namespace. The previous per-release ownership made the second
concurrent install collide with Helm's "cannot import into current
release" error, and uninstalling the first release deleted the SA out
from under any sibling release that worked around the collision with
`serviceAccount.create=false`.

The parent `tracebloc/client` chart 1.3.4 now owns the SA, exposing
its name via `ingestionAuthz.serviceAccountName`. This subchart's
default flips to `create: false` so it consumes that shared SA. The
`name` value is still required so the post-install hook Job knows
which SA's token to mount.

`serviceAccount.create=true` remains available as an escape hatch for
operators on a pre-1.3.4 parent chart, with a comment in values.yaml
explaining when (and only when) to flip it back on.

Breaking change: bumps chart to 0.2.0. Pair with the 1.3.4 parent
chart bump; see the parent's MIGRATION.md "Upgrading to 1.3.4" section
for the SA-adoption procedure on clusters where a 0.1.0 release
already created the SA.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(chart): bump ingestor digest to v0.3.0 + chart to 1.3.5 (#134)

v0.3.0 is the first production-ready ingestor release (signed +
SBOM), validated end-to-end against EKS on 2026-05-19 (6 files in
PVC + 576 MySQL rows via the declarative chart path). The previous
default (v0.3.0-rc1) had three real-cluster bugs that landed as
tracebloc/data-ingestors#106:

- #103 wheel + sdist were missing schema/ingest.v1.json
- #104 image-resolution validator tuple-vs-list comparison
- #105 _has_extension dot/case normalization (no more cat1.jpeg.jpeg)

Chart bumped to 1.3.5 so the auto-upgrade cronjob (#69) detects the
change and rolls customers onto v0.3.0 on the next tick.

ingestor image: ghcr.io/tracebloc/ingestor@sha256:463e2367...07a4a
cosign verify available; release notes contain the verification
command.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#135): publish ingestor subchart alongside parent chart (#136)

The customer-facing install path is

    helm repo add tracebloc https://tracebloc.github.io/client
    helm install my-dataset tracebloc/ingestor \
      --namespace tracebloc-templates \
      --set-file ingestConfig=./my.yaml

For `tracebloc/ingestor` to resolve from that helm repo, the ingestor
subchart must be packaged into gh-pages alongside the parent client
chart. Before this PR, `release-helm-chart.yaml` only ran
`helm package ./client`, so the second install path returned
`Error: chart "ingestor" not found`. helm-ci.yaml also only lints the
parent chart, so any future regression in `ingestor/templates/` would
land on develop without CI noticing.

Three changes:

1. release-helm-chart.yaml: package + index BOTH client and ingestor
   into a single shared index.yaml. Attach both tgzs to the GitHub
   release for download-by-tag pinning.

2. helm-ci.yaml: lint the ingestor subchart on every PR alongside the
   per-platform client lints. Plain `helm lint --strict ./ingestor`
   is enough — its only required value (ingestConfig) emits INFO not
   FAIL, and the chart's templates don't branch on platform so the
   per-platform values-file matrix doesn't apply.

3. ingestor/Chart.yaml: bump appVersion 0.3.0-rc1 → 0.3.0 to match
   the tracebloc/data-ingestors v0.3.0 release that just shipped.
   Chart version (0.2.0) is unchanged; appVersion is descriptive.

Validated locally: both charts package cleanly
(client-1.3.5.tgz, ingestor-0.2.0.tgz), all four platform-specific
client lints pass, ingestor lint passes.

Closes #135.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(ingestor): explain image vs chart update lifecycle (#138)

Customers ask: "the cluster has an auto-upgrade cronjob — does that
mean my ingestor chart updates too?" The answer is nuanced: the
image auto-updates (via INGESTOR_IMAGE_DIGEST on jobs-manager,
kept current by the cronjob), but the chart on your workstation
is independent — Helm's repo cache doesn't refresh itself.

Add a "How updates work" section that explains the two-layer model
and the strong property that the image you run is decoupled from
the chart version that submitted the request. Plus an explicit FAQ
on previously-installed ingestor releases (nothing to upgrade —
fire-and-forget).

No code change.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix three bugbot findings from PR #137 review (#142)

* fix(#139): preserve idempotency key across helm upgrade

The ingestor.idempotencyKey helper defaulted to "<release>-<unix-epoch>"
and re-stamped on every render. `helm upgrade --reuse-values`
preserves the stored value "" (not the previously-rendered key), so
the template re-evaluated `now | unixEpoch` and produced a NEW key
each upgrade — accidentally creating duplicate ingestion runs from
what customers expected to be no-op upgrades. Contradicts the
documented behavior in ingestor/README.md added in #138.

Look up the existing post-install hook ConfigMap from the previous
render and reuse its idempotency_key. On fresh install (or after
uninstall) the lookup returns empty and we fall through to the
now-based default. `helm template` (no cluster connection) returns
empty for lookup too, so local previews still get a fresh key per
render — matches the in-cluster install path the first time.

Caught by bugbot on PR #137 review.

Closes #139.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#140): read requests-proxy resources from values

The requests-proxy deployment hardcoded its container resources,
ignoring the resources.requestsProxy schema entry that values.schema.json
has defined since the requests-proxy was added. Every other component
(jobsManager, podsMonitor, mysql) reads from .Values.resources.<name>.*
with defaults — bring requestsProxy in line with that pattern.

Adds the resources.requestsProxy block to values.yaml with the
existing hardcoded defaults so behavior on a fresh install is
unchanged. The template uses the default-through-dict nil-guard
idiom so `helm upgrade --reuse-values` from a pre-1.3.6 release
(where the value didn't exist) still renders cleanly without
crashing on a nil parent.

Caught by bugbot on PR #137 review.

Closes #140.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#141): add images.ingestor entry to values.schema.json

values.yaml has had images.ingestor.digest since #126, and the
jobs-manager template surfaces it as INGESTOR_IMAGE_DIGEST, but the
schema didn't validate it — every other image (jobsManager,
podsMonitor, resourceMonitor, requestsProxy, mysqlClient, busybox)
has an entry. An operator setting --set images.ingestor.digest=foo
(not the canonical sha256:<64-hex>) bypassed schema validation and
failed only later inside submit_ingestion_run.py.

Add the missing entry mirroring the other image entries' shape.
helm template now rejects malformed digests at chart-template time
("values don't meet the specifications of the schema(s)... Does not
match pattern '^(sha256:[a-f0-9]{64})?$'") rather than waiting for
runtime.

Caught by bugbot on PR #137 review.

Closes #141.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lukasWuttke <54042461+LukasWodka@users.noreply.github.com>
Co-authored-by: Syed Is Saqlain <saqlain.syed007@gmail.com>
Co-authored-by: Syed Saqlain <syedsaqlain@MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Lukas Wuttke <lukas@tracebloc.io>
@saadqbal saadqbal deleted the ci/add-wip-limit-caller branch May 25, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants