Skip to content

dataset rm: remove the central backend catalog entry (server-side delete path) #39

@aptracebloc

Description

@aptracebloc

Follow-up from #30 (dataset rm). Spun out so the CLI-direct teardown can ship without blocking on cross-repo work.

Problem

A dataset push creates three artifacts:

  1. the MySQL table (training_test_datasets.<table>),
  2. the dataset's files on the shared PVC, and
  3. a central tracebloc backend catalog entry (the "edge label metadata" the in-cluster ingestor syncs to the backend after a successful ingest).

dataset rm (#30) removes (1) and (2) in-cluster via exec — but the CLI has no direct line to the central backend, so (3) is left orphaned: a catalog entry pointing at a table/files that no longer exist.

This is observable today: the two datasets that fully ingested during v0.1 testing (clitest_reg_train, clitest_tc_train) reported records "sent to API", i.e. they created catalog entries; the manual cleanup dropped the in-cluster artifacts but not those entries.

Options

  • jobs-manager delete-ingestion endpoint (mirrors submit-ingestion-run) that removes all three server-side, including the backend call. Cleanest; needs work in tracebloc/client + data-ingestors.
  • A backend delete API the CLI could call directly (if one exists / is acceptable to expose).

Acceptance

Refs #30, #31.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions