Follow-up from #30 (dataset rm). Spun out so the CLI-direct teardown can ship without blocking on cross-repo work.
Problem
A dataset push creates three artifacts:
- the MySQL table (
training_test_datasets.<table>),
- the dataset's files on the shared PVC, and
- a central tracebloc backend catalog entry (the "edge label metadata" the in-cluster ingestor syncs to the backend after a successful ingest).
dataset rm (#30) removes (1) and (2) in-cluster via exec — but the CLI has no direct line to the central backend, so (3) is left orphaned: a catalog entry pointing at a table/files that no longer exist.
This is observable today: the two datasets that fully ingested during v0.1 testing (clitest_reg_train, clitest_tc_train) reported records "sent to API", i.e. they created catalog entries; the manual cleanup dropped the in-cluster artifacts but not those entries.
Options
- jobs-manager
delete-ingestion endpoint (mirrors submit-ingestion-run) that removes all three server-side, including the backend call. Cleanest; needs work in tracebloc/client + data-ingestors.
- A backend delete API the CLI could call directly (if one exists / is acceptable to expose).
Acceptance
Refs #30, #31.
Follow-up from #30 (
dataset rm). Spun out so the CLI-direct teardown can ship without blocking on cross-repo work.Problem
A
dataset pushcreates three artifacts:training_test_datasets.<table>),dataset rm(#30) removes (1) and (2) in-cluster via exec — but the CLI has no direct line to the central backend, so (3) is left orphaned: a catalog entry pointing at a table/files that no longer exist.This is observable today: the two datasets that fully ingested during v0.1 testing (
clitest_reg_train,clitest_tc_train) reported records "sent to API", i.e. they created catalog entries; the manual cleanup dropped the in-cluster artifacts but not those entries.Options
delete-ingestionendpoint (mirrorssubmit-ingestion-run) that removes all three server-side, including the backend call. Cleanest; needs work intracebloc/client+data-ingestors.Acceptance
dataset rm(or its server-side counterpart) removes the central catalog entry for a successfully-ingested dataset, leaving no orphan.Refs #30, #31.