Skip to content

mcp-data-platform-v1.73.0

Choose a tag to compare

@github-actions github-actions released this 31 May 03:24
· 53 commits to main since this release
78bff93

Overview

This release adds an admin-only Indexing dashboard: one cross-kind view of embedding-index health for every consumer of the shared index-jobs framework (introduced in v1.71.0, made multi-consumer in v1.72.0). It surfaces, in one place, whether indexing is healthy, what is covered, what failed, and why, and lets an operator re-index or retry. It also wires indexjobs.Counts(), which was added in v1.71.0 but had no surface until now.

There is no database migration and no configuration change. No existing tool, admin endpoint, or config key changes behavior. The dashboard reads existing index_jobs and vector-table state.

Why

Embedding work runs off the request path through the shared queue, with two consumers today (api-catalog operation vectors and tool descriptors) and more planned. Until now the only embedding status in the portal was per-catalog badges inside the Catalogs panel; the tools consumer had no operator visibility at all. Embedding can fail off the request path (provider down, model dimension mismatch, repeated retries hitting the attempt cap, vectors lost) and silently degrade ranking=semantic/hybrid to lexical with only a log line as signal. This release gives operators a single place to see and act on that.

What it does

A new Indexing tab in the admin Dashboard (alongside MCP, API Gateway, Health, and Events; deep-link /portal/admin#indexing) renders cross-kind health from real index_jobs data, polling every 5 seconds so it reflects work as the worker, reconciler, and reaper complete it. It is system-wide and admin-only by platform convention (operators see all indexing; it is not a per-persona capability), and it serves every index-jobs consumer uniformly, so a new consumer gets visibility here for free the moment it registers.

The tab includes:

  • Provider health banner - configured provider, model, and dimension, with a clear degraded state (noop or unconfigured) since a bad provider makes the whole index meaningless and pauses indexing.
  • Index state by kind - a custom d3 heatmap (the centerpiece): kind rows by job-state columns (pending, running, succeeded, failed), cells colored by count, so a failing or sparse corner is obvious at a glance.
  • Per-kind health cards - status distribution, coverage where derivable (api-catalog shows indexed vs expected from operation_count; the tools kind shows an indexed count with an in-sync / re-syncing indicator, since it re-syncs continuously and stamps no expected count), last-activity time, and a Re-index button.
  • Throughput timeline - a d3 area of completed jobs over time, so an operator can see indexing keeping up or stalling.
  • Embed latency - per-kind started-to-completed duration (p50 with a p95 marker), surfacing slow passes such as the CPU-only embedder case.
  • In flight - running jobs with worker id, lease countdown, and items-done progress for long passes.
  • Retry backoff - pending jobs with attempt count and next run time.
  • Failure triage - failed jobs grouped by error signature, each with a one-click Retry.
  • Jobs drill-down - a filterable table (by kind and status) with trigger, attempts, last update, and error.

The existing per-catalog embedding badges in the API Catalogs panel are unchanged; this tab is the cross-kind superset.

API

Three new admin endpoints back the dashboard. All degrade gracefully when no queue is wired (no database or no configured embedding provider): the read endpoints return provider status with an empty kinds list rather than an error.

  • GET /api/v1/admin/index-jobs - provider health plus a per-kind rollup (per-state counts, last activity, and coverage where derivable).
  • GET /api/v1/admin/index-jobs/jobs?kind=&status=&source_id=&limit= - the job list / drill-down, newest first (limit default 50, max 500).
  • POST /api/v1/admin/index-jobs/reindex - {kind, source_id?} enqueues manual-retry jobs (one unit with source_id, every out-of-sync unit without); idempotent; 202 on success, 404 for an unknown kind, 409 when no queue is wired.

coverage.expected_known is true only for kinds that stamp an expected count (api-catalog's operation_count); the tools kind re-syncs continuously and reports false, in which case the dashboard renders a sync indicator from the latest job status instead of an indexed/expected ratio.

How it works

  • pkg/indexjobs gains a Coverage type and an optional CoverageReporter sink interface, plus a Reporter that aggregates per-kind counts, coverage, the job list, and the re-index command over the shared Store and Registry. A new consumer gets dashboard coverage by implementing the optional interface; the generic queue never learns kind-specific table names.
  • Last activity is a true MAX(activity) aggregate computed in the counts query, not the newest-by-id row, so an out-of-order completion of an older job is not missed.
  • Per-sink coverage: the api-catalog sink reports real indexed vs expected from operation_count; the tools sink reports an indexed-only count. These read each kind's own vector table, so coverage stays a per-kind concern.
  • The admin handler depends on a small IndexJobsService interface satisfied by the reporter and wired through Platform.IndexJobsReporter(), which is nil when no queue is wired.

Upgrade notes

No operator action is required. There is no migration and no config change. On a deployment with a database and a configured embedding provider, the Indexing tab populates from existing index-jobs state on first load. On deployments without an embedding provider, the queue is dormant and the tab renders an informative degraded/empty state rather than an error.

Compatibility

  • Existing MCP tools, configuration, and all prior admin endpoints are unchanged.
  • The api-catalog and tools embedding consumers are behaviorally unchanged; this release only adds a read and re-index surface over the queue they already use.
  • The endpoints and tab are admin-only.

Changelog

  • feat(admin): cross-kind Indexing dashboard for embedding health (#505) (#506) (@cjimti)

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v1.73.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_1.73.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_1.73.0_linux_amd64.tar.gz