mcp-data-platform-v1.75.0
Indexing dashboard: communicate state, not raw job rows
The admin Indexing dashboard now answers "is indexing healthy?" at a glance, and every red thing on it is either actionable or self-resolving. Previously it rendered three structurally different metrics (vector coverage, per-unit job state, recent job activity) side by side with no labels reconciling them, so a healthy, fully-indexed instance could read as broken: 34 / 34 indexed - 100% next to 0 succeeded and last activity never. Failures from a past deploy lingered in triage forever, and clicking Retry did nothing visible.
Highlights
- Summary-first health verdict per kind. Each kind leads with a single plain-language verdict computed server-side:
Healthy,Indexing,Degraded, orIdle (complete). The verdict is derived from the same counts and coverage the cards show, so the lead word and the detail metrics cannot disagree. - Fully indexed and idle no longer reads as broken. A kind at full coverage with no recent jobs (vectors seeded before the queue existed) now reads "fully indexed, idle" instead of "last activity never".
- The two metric families are labelled. Vector coverage (how much is indexed) is kept visually and textually distinct from per-unit job state, so "1 succeeded" reads as "1 unit, last run succeeded", not a one-job history.
- Self-resolving failure triage. A failure clears automatically once a later job for the same unit succeeds. Retry re-indexes the unit and the card clears on the next success; an explicit Dismiss is the fallback for a failure that no future success will supersede (for example a removed consumer's leftover rows).
- Triage cards carry time and a drill-in. First-seen and last-seen timestamps, last-succeeded context, occurrence and attempt counts, and an expandable drill-in to the un-redacted error and the underlying job id, so a one-hour-old tombstone is distinguishable from an active incident.
- Routine heartbeats are collapsed. Timer-driven reconciler successes for a unit (which each replica re-runs on its own schedule) fold into a single "synced xN" row instead of an unbounded firehose of duplicate rows.
New and changed API
GET /api/v1/admin/index-jobs- the per-kind summary gainsverdictandunresolved_failures.succeeded/failedremain per-unit latest-status counts (units by last run), andunresolved_failurescounts distinct units with an open failed job (the verdict's degraded trigger).GET /api/v1/admin/index-jobs/failures- new. Returns the units with open (unresolved) failures, one per unit, most-recently-failed first, withlatest_job_id, un-redactedlast_error,attempts,occurrences,first_failed_at/last_failed_at, and optionallast_succeeded_at.POST /api/v1/admin/index-jobs/dismiss- new. Resolves every open failure for one unit. Idempotent.
Database migration
This release adds migration 000055_index_jobs_resolved_at, which adds a nullable resolved_at column to index_jobs plus a partial index on unresolved failures. It is additive and backward-compatible: every existing failed row is treated as unresolved until a later success supersedes it or an operator dismisses it. The migration runs automatically on startup through the platform's migration runner; no manual step is required.
Upgrade notes
- No configuration changes are required.
- On first start after upgrade the new migration applies automatically. A
failedrow now becomes "resolved" (and leaves the triage surface) when a later job for the same(source_kind, source_id)succeeds, so historical tombstones clear as their units are re-indexed.
Out of scope (planned follow-up)
The no consumer registered for source_kind failures (an old replica claiming a job for a kind it has not registered during a rolling deploy) are best fixed by the worker requeuing or skipping the job rather than terminating it. This release covers only the dashboard's presentation of those tombstones (they auto-resolve once the kind is re-registered and a job succeeds, and Dismiss clears any that never will). The worker-side fix is tracked as a separate follow-up against pkg/indexjobs/worker.go.
Changelog
Full changelog: v1.74.0...v1.75.0
Installation
Homebrew (macOS)
brew install txn2/tap/mcp-data-platformClaude Code CLI
claude mcp add mcp-data-platform -- mcp-data-platformDocker
docker pull ghcr.io/txn2/mcp-data-platform:v1.75.0Verification
All release artifacts are signed with Cosign. Verify with:
cosign verify-blob --bundle mcp-data-platform_1.75.0_linux_amd64.tar.gz.sigstore.json \
mcp-data-platform_1.75.0_linux_amd64.tar.gz