docs: make declarative-ingest staging self-contained (data-ingestors#131 B/C)#46
docs: make declarative-ingest staging self-contained (data-ingestors#131 B/C)#46divyasinghds wants to merge 1 commit into
Conversation
Fixes the docs side of data-ingestors#131: - B2 / B1: section 2 of the declarative path linked out for the staging recipe and described `kubectl cp` while the Detailed Setup section (further down on the same page) prescribed a host-path `cp -R`. Replaced section 2 with an inline host-path recipe that matches the Detailed Setup section, and demoted `kubectl cp` to a Note for multi-node / EKS deployments. The recipe now uses a `<prefix>` subdirectory so the path lines up with the `/data/shared/<prefix>/...` style used in ingest.yaml examples. - C2: section 4 was silent on where CLIENT_ID / CLIENT_PASSWORD come from in the declarative path. Added a sentence noting the ingestor Pod inherits them from the Kubernetes Secret the parent tracebloc/client chart creates in <workspace> at install time — no creds are passed on the `helm install` line. - C5: section 4 mentioned the run-twice rule only as a trailing parenthetical. Promoted it to bolded prose and added a worked train + test pair (two `helm install` invocations, distinct release names + `table:` + `intent:`) so the rule is concrete. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
👋 Heads-up — Code review queue is at 11 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
|
||
| ## Overview | ||
|
|
||
| Make your data available to the Kubernetes cluster so it can be used for training and evaluation. Regardless of where your client runs on Azure, AWS, Google Cloud, or a local Minikube setup, the process of ingesting datasets works the same way. |
There was a problem hiding this comment.
Please change to "Whether your client runs on Azure, AWS, Google Cloud ...". The current sentence is cumbersome and harder to follow.
| The `tracebloc/client` parent chart bootstraps the cluster (jobs-manager, MySQL, RBAC). The `tracebloc/ingestor` subchart submits per-dataset ingestion runs against it. | ||
|
|
||
| <Note> | ||
| If you installed the client via the one-liner (`bash <(curl -fsSL https://tracebloc.io/i.sh)`), use `--reset-then-reuse-values` so the helm upgrade doesn't drop the values the installer applied: |
There was a problem hiding this comment.
This should not come after we give commands that the user can run and only later find out that there were side effects, like if not setting the flag --reset-then-reuse-values. This part should be at the beginning of the section.
@saadqbal WDYT?
Summary
Companion to tracebloc/data-ingestors#133, addressing the docs-side items from data-ingestors#131.
kubectl cpPod while the Detailed Setup section (further down on the same page) prescribed a host-pathcp -R. Same page, two different staging mechanisms. Replaced section 2 with the inline host-path recipe that matches the Detailed Setup section, and demotedkubectl cpto a Note for multi-node / EKS deployments. The recipe now uses a<prefix>subdirectory so the path lines up with the/data/shared/<prefix>/...style used in the YAML examples.CLIENT_ID/CLIENT_PASSWORDcome from. Added a sentence noting the ingestor Pod inherits them from the Kubernetes Secret the parenttracebloc/clientchart creates in<workspace>at install time. No creds are passed on thehelm installcommand.helm installinvocations with distinct release names,table:, andintent:values.Out of scope
object_id/object_countcolumns inobject_detectionREADME) and C-series polish items C1/C3/C4 still need maintainer attention in the data-ingestors repo.Test plan
mint devand walk through the declarative section start to finish; the staging recipe, the example YAML, and the train+test install commands should all reference compatible paths and names.mint broken-linksclean.🤖 Generated with Claude Code
Note
Low Risk
Documentation-only edits to prepare-dataset.mdx with no runtime or security behavior changes.
Overview
The declarative Prepare Data path in
prepare-dataset.mdxis tightened so reviewers can follow staging → YAML → Helm without bouncing to external READMEs for the default case.Section 2 (staging) now documents the single-node default inline: copy into
~/.tracebloc/<workspace>/data/<prefix>and reference/data/shared/<prefix>/...iningest.yaml.kubectl cp/ init-container staging is relegated to a Note for multi-node or EKS.Section 4 (install) leads with the train/test rule, shows two
helm installexamples (cats-dogs-train/cats-dogs-testwith separate config files), and states thatCLIENT_ID/CLIENT_PASSWORDcome from the parenttracebloc/clientSecret—nothing to pass on the Helm CLI.Reviewed by Cursor Bugbot for commit c7d1118. Bugbot is set up for automated code reviews on this repo. Configure here.