Skip to content

mcp-data-platform-v1.75.0

Choose a tag to compare

@github-actions github-actions released this 31 May 07:41
· 51 commits to main since this release
4d1b6e3

Indexing dashboard: communicate state, not raw job rows

The admin Indexing dashboard now answers "is indexing healthy?" at a glance, and every red thing on it is either actionable or self-resolving. Previously it rendered three structurally different metrics (vector coverage, per-unit job state, recent job activity) side by side with no labels reconciling them, so a healthy, fully-indexed instance could read as broken: 34 / 34 indexed - 100% next to 0 succeeded and last activity never. Failures from a past deploy lingered in triage forever, and clicking Retry did nothing visible.

Highlights

  • Summary-first health verdict per kind. Each kind leads with a single plain-language verdict computed server-side: Healthy, Indexing, Degraded, or Idle (complete). The verdict is derived from the same counts and coverage the cards show, so the lead word and the detail metrics cannot disagree.
  • Fully indexed and idle no longer reads as broken. A kind at full coverage with no recent jobs (vectors seeded before the queue existed) now reads "fully indexed, idle" instead of "last activity never".
  • The two metric families are labelled. Vector coverage (how much is indexed) is kept visually and textually distinct from per-unit job state, so "1 succeeded" reads as "1 unit, last run succeeded", not a one-job history.
  • Self-resolving failure triage. A failure clears automatically once a later job for the same unit succeeds. Retry re-indexes the unit and the card clears on the next success; an explicit Dismiss is the fallback for a failure that no future success will supersede (for example a removed consumer's leftover rows).
  • Triage cards carry time and a drill-in. First-seen and last-seen timestamps, last-succeeded context, occurrence and attempt counts, and an expandable drill-in to the un-redacted error and the underlying job id, so a one-hour-old tombstone is distinguishable from an active incident.
  • Routine heartbeats are collapsed. Timer-driven reconciler successes for a unit (which each replica re-runs on its own schedule) fold into a single "synced xN" row instead of an unbounded firehose of duplicate rows.

New and changed API

  • GET /api/v1/admin/index-jobs - the per-kind summary gains verdict and unresolved_failures. succeeded/failed remain per-unit latest-status counts (units by last run), and unresolved_failures counts distinct units with an open failed job (the verdict's degraded trigger).
  • GET /api/v1/admin/index-jobs/failures - new. Returns the units with open (unresolved) failures, one per unit, most-recently-failed first, with latest_job_id, un-redacted last_error, attempts, occurrences, first_failed_at/last_failed_at, and optional last_succeeded_at.
  • POST /api/v1/admin/index-jobs/dismiss - new. Resolves every open failure for one unit. Idempotent.

Database migration

This release adds migration 000055_index_jobs_resolved_at, which adds a nullable resolved_at column to index_jobs plus a partial index on unresolved failures. It is additive and backward-compatible: every existing failed row is treated as unresolved until a later success supersedes it or an operator dismisses it. The migration runs automatically on startup through the platform's migration runner; no manual step is required.

Upgrade notes

  • No configuration changes are required.
  • On first start after upgrade the new migration applies automatically. A failed row now becomes "resolved" (and leaves the triage surface) when a later job for the same (source_kind, source_id) succeeds, so historical tombstones clear as their units are re-indexed.

Out of scope (planned follow-up)

The no consumer registered for source_kind failures (an old replica claiming a job for a kind it has not registered during a rolling deploy) are best fixed by the worker requeuing or skipping the job rather than terminating it. This release covers only the dashboard's presentation of those tombstones (they auto-resolve once the kind is re-registered and a job succeeds, and Dismiss clears any that never will). The worker-side fix is tracked as a separate follow-up against pkg/indexjobs/worker.go.

Changelog

  • feat(indexing): dashboard communicates state, not raw job rows (#509) (#510) (@cjimti)

Full changelog: v1.74.0...v1.75.0

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v1.75.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_1.75.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_1.75.0_linux_amd64.tar.gz