tracebloc · saadqbal · May 20, 2026 · May 20, 2026 · May 20, 2026
diff --git a/README.md b/README.md
@@ -50,7 +50,7 @@ For the threat model, defense layers, per-platform caveats, operator responsibil
 
 ## Deploy
 
-This repo ships the **tracebloc** unified Helm chart (currently `v1.3.1`) — one chart for AKS, EKS, bare-metal, and OpenShift.
+This repo ships the **tracebloc** unified Helm chart (currently `v1.3.5`) — one chart for AKS, EKS, bare-metal, and OpenShift.
 
 ### Quick install
 
@@ -77,16 +77,48 @@ For existing Kubernetes clusters:
 ```bash
 helm repo add tracebloc https://tracebloc.github.io/client
 helm repo update
-helm install my-tracebloc tracebloc/tracebloc \
+helm install my-tracebloc tracebloc/client \
   --namespace tracebloc --create-namespace \
   -f my-values.yaml
 ```
 
 Full deployment guide → **[docs/INSTALL.md](docs/INSTALL.md)** (prerequisites, required values, upgrade & rollback, air-gapped install).
 
+## Ingest a dataset
+
+Once the client is running, get a dataset into your cluster's local MySQL with ~8 lines of YAML and a single `helm install`. No Dockerfile, no Python script — the platform owns the official image, you describe what you want ingested.
+
+The flow is two steps. **First**, stage your raw files on the cluster's shared PVC (`client-pvc` by default, mounted at `/data/shared/` inside the ingestor Pod). The chart doesn't transport data into the cluster — it points at data the cluster can already see. The simplest pattern is a throwaway `kubectl cp` Pod that mounts the PVC; the chart README links the manifest.
+
+**Second**, describe the dataset and install:
+
+```yaml
+# my-cats-dogs.yaml
+apiVersion: tracebloc.io/v1
+kind: IngestConfig
+category: image_classification
+table: cats_dogs_train
+intent: train
+csv: /data/shared/cats-dogs/labels.csv
+images: /data/shared/cats-dogs/images/
+label: label
+```
+
+```bash
+helm install my-cats-dogs tracebloc/ingestor \
+  --namespace tracebloc \
+  --set-file ingestConfig=./my-cats-dogs.yaml
+```
+
+The ingestor runs once, validates the data, copies files into the destination directory on the PVC, inserts rows into the cluster's MySQL, sends metadata to the tracebloc backend — then exits. The chart artifacts (ConfigMap + post-install hook Job) become inert; nothing keeps running. Repeat per dataset.
+
+Full ingestor docs → **[ingestor/README.md](ingestor/README.md)** (data staging patterns, every supported category, the schema, the update model, verification, override knobs).
+
 | Topic | Where to look |
 |---|---|
 | Production install + required values | [docs/INSTALL.md](docs/INSTALL.md) |
+| Ingest a dataset (declarative YAML) | [ingestor/README.md](ingestor/README.md) |
+| Available ingestion categories + example YAMLs | [tracebloc/data-ingestors templates](https://github.com/tracebloc/data-ingestors/tree/master/templates) |
 | Threat model & operator responsibilities | [docs/SECURITY.md](docs/SECURITY.md) |
 | Migrating from `eks-1.0.x` / `aks-*` charts to `client-1.x` | [docs/MIGRATIONS.md](docs/MIGRATIONS.md) |
 | Per-tenant migration runbook | [docs/migration-tools/README.md](docs/migration-tools/README.md) |

diff --git a/docs/INSTALL.md b/docs/INSTALL.md
@@ -33,7 +33,7 @@ helm repo add tracebloc https://tracebloc.github.io/client
 helm repo update
 
 # Install with a release name and namespace
-helm install my-tracebloc tracebloc/tracebloc \
+helm install my-tracebloc tracebloc/client \
   --namespace tracebloc \
   --create-namespace \
   -f my-values.yaml
@@ -104,7 +104,7 @@ For platform-specific settings (AKS, EKS, bare-metal, OpenShift), see `client/ci
 ```bash
 # Upgrade to a new chart version (repo install)
 helm repo update
-helm upgrade my-tracebloc tracebloc/tracebloc -n tracebloc -f my-values.yaml
+helm upgrade my-tracebloc tracebloc/client -n tracebloc -f my-values.yaml
 
 # Upgrade when using a tgz
 helm upgrade my-tracebloc ./tracebloc-2.0.1.tgz -n tracebloc -f my-values.yaml
@@ -228,7 +228,7 @@ After that, users can run:
 
 ```bash
 helm repo add tracebloc https://tracebloc.github.io/client
-helm install my-tracebloc tracebloc/tracebloc -n tracebloc -f my-values.yaml
+helm install my-tracebloc tracebloc/client -n tracebloc -f my-values.yaml
 ```
 
 ---
@@ -241,3 +241,37 @@ helm install my-tracebloc tracebloc/tracebloc -n tracebloc -f my-values.yaml
 - [ ] Namespace created or `--create-namespace` used.
 - [ ] Resource requests/limits and storage sizes reviewed in `values.yaml` (e.g. `pvc.mysql`, `pvc.logs`, `pvc.data`).
 - [ ] Lint and template checked: `helm lint ./client -f my-values.yaml` and `helm template my-tracebloc ./client -f my-values.yaml`.
+
+---
+
+## Next: ingest your first dataset
+
+With the client running, the typical follow-up is to land a dataset in the cluster's local MySQL so training jobs can read it. The `tracebloc/ingestor` subchart wraps that flow — customers describe the dataset in ~8 lines of YAML and run a single `helm install`. No Dockerfile, no Python script.
+
+The chart **does not transport data into the cluster** — it points at data already accessible on the cluster's shared PVC (`client-pvc` by default, mounted at `/data/shared/` inside the ingestor Pod). Stage your CSV + image / text / annotation files there first; the ingestor chart README documents the `kubectl cp` pattern and production sync alternatives.
+
+Example: once you've staged a cats-vs-dogs image classification dataset under `/data/shared/cats-dogs/` on the PVC, the `ingest.yaml` describes what's there:
+
+```yaml
+# my-cats-dogs.yaml
+apiVersion: tracebloc.io/v1
+kind: IngestConfig
+category: image_classification
+table: cats_dogs_train
+intent: train
+csv: /data/shared/cats-dogs/labels.csv
+images: /data/shared/cats-dogs/images/
+label: label
+```
+
+```bash
+helm install my-cats-dogs tracebloc/ingestor \
+  --namespace tracebloc \
+  --set-file ingestConfig=./my-cats-dogs.yaml
+```
+
+The ingestor runs once: validates the data, copies files into the destination directory on the PVC, inserts rows into MySQL, sends metadata to the tracebloc backend, then exits. Repeat per dataset.
+
+Full ingestor documentation, including the schema for every supported category, the auto-update model that keeps the ingestor image current without per-install overrides, and verification commands → **[ingestor/README.md](../ingestor/README.md)**.
+
+Category-specific YAML examples (image classification, object detection, tabular regression, semantic segmentation, text classification, masked language modeling, etc.) → **[tracebloc/data-ingestors templates](https://github.com/tracebloc/data-ingestors/tree/master/templates)**.
diff --git a/ingestor/README.md b/ingestor/README.md
@@ -24,6 +24,75 @@ The SA is shared by every `tracebloc/ingestor` release in the namespace
 which broke as soon as a second ingestor release tried to install
 ([tracebloc/client#129](https://github.com/tracebloc/client/issues/129)).
 
+## Stage your data on the shared PVC
+
+This chart **does not transport data into the cluster.** It points at data already accessible to the cluster's shared PVC (`client-pvc` by default, mounted at `/data/shared/` inside every pod that uses it, including the ingestor Pod that jobs-manager spawns).
+
+Before running `helm install tracebloc/ingestor`, you need your raw files (the CSV plus any images / texts / annotations / masks / sequences the category requires) under `/data/shared/<your-prefix>/` on that PVC. The `csv:`, `images:` (etc.) paths in your `ingest.yaml` are paths *inside the ingestor Pod's filesystem*, which is the PVC mount.
+
+How to stage depends on dataset size and your environment. Two common patterns:
+
+### Pattern 1: `kubectl cp` via a pvc-shell pod (small datasets, one-off)
+
+Spin up a throwaway pod that mounts the PVC, copy files in, tear it down:
+
+```yaml
+# /tmp/pvc-shell.yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  name: pvc-shell
+  namespace: tracebloc
+spec:
+  restartPolicy: Never
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 65534
+    seccompProfile:
+      type: RuntimeDefault
+  containers:
+    - name: shell
+      image: alpine:3.19
+      command: ["sleep", "3600"]
+      securityContext:
+        allowPrivilegeEscalation: false
+        capabilities:
+          drop: ["ALL"]
+      volumeMounts:
+        - name: shared
+          mountPath: /data/shared
+  volumes:
+    - name: shared
+      persistentVolumeClaim:
+        claimName: client-pvc
+```
+
+```bash
+kubectl apply -f /tmp/pvc-shell.yaml
+kubectl -n tracebloc wait --for=condition=Ready pod/pvc-shell --timeout=60s
+
+kubectl -n tracebloc exec pvc-shell -- \
+  mkdir -p /data/shared/my-dataset/images
+
+kubectl -n tracebloc cp ./local-images/   pvc-shell:/data/shared/my-dataset/
+kubectl -n tracebloc cp ./local-labels.csv pvc-shell:/data/shared/my-dataset/labels.csv
+
+# Verify what landed
+kubectl -n tracebloc exec pvc-shell -- ls /data/shared/my-dataset/
+
+kubectl -n tracebloc delete pod pvc-shell
+```
+
+Now `csv: /data/shared/my-dataset/labels.csv` + `images: /data/shared/my-dataset/images/` in your `ingest.yaml` will resolve.
+
+### Pattern 2: Init container with cloud-storage sync (production / large datasets)
+
+For datasets too large to `kubectl cp` (and any production workflow with versioned data), run a one-shot Pod whose init or main container pulls from S3 / GCS / Azure Blob into the PVC. Customers typically wire this into their CI / GitOps tool so the data syncs before the ingestion `helm install` runs. The chart itself stays out of this — it's a precondition, not a chart responsibility.
+
+### Where the PVC name comes from
+
+The default `client-pvc` is set by the parent client chart's PVC block (see `values.yaml#pvc`). If your install renamed it, the ingestor Pod will mount whatever the parent chart configured via `CLIENT_PVC` on jobs-manager. In the rare case of a custom name, `kubectl -n tracebloc get pvc` shows what's actually bound, and that's the value to use as `claimName:` in the pvc-shell manifest above.
+
 ## What this chart owns
 
 | Resource | Owner | Lifecycle |