Skip to content

chore(schema): re-vendor ingest.v1.json — register seq2seq (recognize-but-unsupported), fix drift check #117

Description

@saadqbal

Problem

The CI Schema drift check (scripts/sync-schema.sh --check) is failing on develop itself, and therefore on every PR. Pre-existing, not caused by any one PR.

Cause: the vendored internal/schema/ingest.v1.json has drifted from data-ingestors master. Upstream added a seq2seq task category (enum entry + an if/then requiring texts, and membership in the self-supervised text group; the shared texts description was updated to mention it). The drift is purely additive — it does not change validation for any already-supported category.

Fix (the cli#103 pattern)

The CLI cannot push seq2seq yet (no discover/build; no code references it; no support ticket). So we recognize it but don't claim support, exactly as #103 did for causal_language_modeling:

  • Re-vendor ingest.v1.json via scripts/sync-schema.sh--check passes.
  • Register seq2seq in internal/push/category.go as CLISupported: false + an UnsupportedNote (Family: text — it uses the texts layout like CLM). The push accept-gate then rejects dataset push --category=seq2seq cleanly instead of leaving a schema⇄registry gap (the TestRegistryCoversSchemaCategories parity test would otherwise fail).
  • Update the two parity tests (TestRegistryKnownCategories want-list; TestSupportedCategories unsupported-with-note list).

Scope

Schema re-vendor + registry recognition + parity tests only. Full seq2seq push support (discover/build for the source\ttarget texts layout, flip CLISupported) is a follow-up feature — the sibling of cli#105 (which does the same for causal_language_modeling).

Acceptance

  • scripts/sync-schema.sh --check green.
  • gofmt / go build ./... / go test ./... green.
  • dataset push --category=seq2seq is rejected with the unsupported note (not a raw backend error).

Work Type: Chore · Area: SDK (CLI) · Related: #103, cli#105, data-ingestors seq2seq

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions