Skip to content

feat: add dataset list command#53

Merged
aptracebloc merged 2 commits into
developfrom
feat/dataset-list
Jun 4, 2026
Merged

feat: add dataset list command#53
aptracebloc merged 2 commits into
developfrom
feat/dataset-list

Conversation

@aptracebloc
Copy link
Copy Markdown
Contributor

@aptracebloc aptracebloc commented Jun 4, 2026

Adds tracebloc dataset list — a read-only listing of the datasets ingested into the cluster (the tables in training_test_datasets). Scope A: names only.

Behaviour

  • Bare tracebloc dataset list runs against your current kubeconfig context + its namespace. --kubeconfig/--context/--namespace are optional overrides (zero-value-safe, same as cluster info); --output-json emits {namespace, release, count, datasets[]} on stdout (human output → stderr).
  • Empty state points at dataset push; --help links the dashboard (https://ai.tracebloc.io/metadata) for the full catalog.

Mechanism

Reuses the dataset rm exec seam — push.SPDYExecutor + findRunningPod + IngestionDatabase — to run one query in the mysql pod:

SELECT table_name FROM information_schema.tables
WHERE table_schema = 'training_test_datasets' ORDER BY table_name;

information_schema (not SHOW TABLES) means a never-pushed cluster returns an empty list, not an error. Raw output → []string via a pure, unit-tested parser.

Exit codes

0 listed (incl. empty) · 3 kubeconfig · 4 no parent release in the namespace · 7 cluster query failed.

Verification

  • make ci green. Cluster-free tests: parseDatasetList, renderDatasetList (empty + populated), writeDatasetListJSON (shape + nil→[]).
  • Live on EKS dev: human listing rendered 69 datasets; --output-json produced clean JSON on stdout (banner on stderr).

🤖 Generated with Claude Code


Note

Low Risk
Read-only cluster query reusing the existing mysql exec seam; no data mutations or auth changes.

Overview
Adds tracebloc dataset list, a read-only way to see ingested dataset table names in the cluster. It is registered under dataset, mentioned on the root home screen, and follows the same kubeconfig / namespace flags and exit codes (3, 4, 7) as dataset push and cluster info.

The command loads the parent release, then calls new push.ListDatasets, which execs into the mysql pod (same SPDYExecutor / findRunningPod path as teardown) and runs an information_schema query so a never-pushed cluster returns an empty list instead of failing. Human output lists names (with an empty-state hint toward dataset push); --output-json writes {namespace, release, count, datasets[]} on stdout with banner on stderr, including JSON error objects on early failures like dataset push.

Unit tests cover mysql output parsing, rendering, JSON shape, and the JSON-on-failure contract.

Reviewed by Cursor Bugbot for commit 13f004c. Bugbot is set up for automated code reviews on this repo. Configure here.

Lists the datasets ingested into the cluster (tables in training_test_datasets), reusing the dataset rm exec seam (SPDYExecutor + findRunningPod) to query the mysql pod via information_schema — so a never-pushed cluster lists empty instead of erroring. Bare 'tracebloc dataset list' uses the current kubeconfig context; --kubeconfig/--context/--namespace override it, and --output-json emits {namespace,release,count,datasets[]} on stdout (human output → stderr). Wired into the dataset subtree, the home screen, and the parent doc comment.

Tests are cluster-free: parseDatasetList (raw mysql output → []string), renderDatasetList (empty + populated), writeDatasetListJSON (shape + nil→[]).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aptracebloc aptracebloc requested a review from saadqbal June 4, 2026 14:03
@aptracebloc aptracebloc self-assigned this Jun 4, 2026
@LukasWodka
Copy link
Copy Markdown
Contributor

👋 Heads-up — Code review queue is at 18 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 860c271. Configure here.

Comment thread internal/cli/dataset_list.go
dataset list --output-json now emits a JSON error object on early-failure paths (kubeconfig, no parent release, cluster query), not just on success — mirroring the dataset push fix from #49. runDatasetList uses a named return + jsonEmitted flag + a defer; adds writeDatasetListErrorJSON. Covered by TestRunDatasetList_OutputJSONEarlyFailureEmitsJSON.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aptracebloc aptracebloc merged commit 1ee73d1 into develop Jun 4, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants