Problem
The CI Schema drift check (scripts/sync-schema.sh --check) is failing on develop itself, and therefore on every PR. Pre-existing, not caused by any one PR.
Cause: the vendored internal/schema/ingest.v1.json has drifted from data-ingestors master. Upstream added a seq2seq task category (enum entry + an if/then requiring texts, and membership in the self-supervised text group; the shared texts description was updated to mention it). The drift is purely additive — it does not change validation for any already-supported category.
Fix (the cli#103 pattern)
The CLI cannot push seq2seq yet (no discover/build; no code references it; no support ticket). So we recognize it but don't claim support, exactly as #103 did for causal_language_modeling:
- Re-vendor
ingest.v1.json via scripts/sync-schema.sh → --check passes.
- Register
seq2seq in internal/push/category.go as CLISupported: false + an UnsupportedNote (Family: text — it uses the texts layout like CLM). The push accept-gate then rejects dataset push --category=seq2seq cleanly instead of leaving a schema⇄registry gap (the TestRegistryCoversSchemaCategories parity test would otherwise fail).
- Update the two parity tests (
TestRegistryKnownCategories want-list; TestSupportedCategories unsupported-with-note list).
Scope
Schema re-vendor + registry recognition + parity tests only. Full seq2seq push support (discover/build for the source\ttarget texts layout, flip CLISupported) is a follow-up feature — the sibling of cli#105 (which does the same for causal_language_modeling).
Acceptance
scripts/sync-schema.sh --check green.
gofmt / go build ./... / go test ./... green.
dataset push --category=seq2seq is rejected with the unsupported note (not a raw backend error).
Work Type: Chore · Area: SDK (CLI) · Related: #103, cli#105, data-ingestors seq2seq
Problem
The CI Schema drift check (
scripts/sync-schema.sh --check) is failing ondevelopitself, and therefore on every PR. Pre-existing, not caused by any one PR.Cause: the vendored
internal/schema/ingest.v1.jsonhas drifted from data-ingestorsmaster. Upstream added aseq2seqtask category (enum entry + anif/thenrequiringtexts, and membership in the self-supervised text group; the sharedtextsdescription was updated to mention it). The drift is purely additive — it does not change validation for any already-supported category.Fix (the cli#103 pattern)
The CLI cannot push
seq2seqyet (no discover/build; no code references it; no support ticket). So we recognize it but don't claim support, exactly as #103 did forcausal_language_modeling:ingest.v1.jsonviascripts/sync-schema.sh→--checkpasses.seq2seqininternal/push/category.goasCLISupported: false+ anUnsupportedNote(Family: text — it uses thetextslayout like CLM). The push accept-gate then rejectsdataset push --category=seq2seqcleanly instead of leaving a schema⇄registry gap (theTestRegistryCoversSchemaCategoriesparity test would otherwise fail).TestRegistryKnownCategorieswant-list;TestSupportedCategoriesunsupported-with-note list).Scope
Schema re-vendor + registry recognition + parity tests only. Full
seq2seqpush support (discover/build for thesource\ttargettextslayout, flipCLISupported) is a follow-up feature — the sibling of cli#105 (which does the same forcausal_language_modeling).Acceptance
scripts/sync-schema.sh --checkgreen.gofmt/go build ./.../go test ./...green.dataset push --category=seq2seqis rejected with the unsupported note (not a raw backend error).Work Type: Chore · Area: SDK (CLI) · Related: #103, cli#105, data-ingestors seq2seq