Skip to content

mcp-data-platform-v0.25.0

Choose a tag to compare

@github-actions github-actions released this 22 Feb 00:48
· 332 commits to main since this release
c62d67f

What's New

v0.25.0 focuses on reducing agent round-trips and closing the knowledge feedback loop. Search results now carry enough context for agents to write SQL immediately, column enrichment is filtered to what matters, and agents can contribute curated queries back to the catalog.

Schema Preview in Search Results (#139, #145)

DataHub search results now include a bounded column-name+type preview in query_context for available tables. Agents no longer need an intermediate datahub_get_schema or trino_describe_table call before writing SQL — the search result itself has what they need.

  • Primary key columns listed first, then remaining columns up to the configured max
  • total_columns field indicates when the preview is truncated
  • Preview omitted (not empty array) when schema is unavailable — no blocking, no errors
  • Configurable: search_schema_preview: true (default) and schema_preview_max_columns: 15 (default)
{
  "query_context": {
    "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)": {
      "available": true,
      "query_table": "hive.sales.orders",
      "estimated_rows": 1500000,
      "schema_preview": [
        {"name": "order_id", "type": "integer"},
        {"name": "customer_id", "type": "integer"},
        {"name": "order_date", "type": "date"},
        {"name": "total_amount", "type": "decimal(10,2)"}
      ],
      "total_columns": 42
    }
  }
}

Column Context Filtering (#132, #140)

Semantic column enrichment from DataHub is now filtered to only columns referenced in the SQL query. A query touching 3 of 70 columns no longer dumps all 70 column descriptions into the response.

  • Dialect-agnostic SQL lexer extracts identifiers in a single pass — handles Trino ES raw_query with embedded JSON, CROSS JOIN UNNEST, double-quoted identifiers, block/line comments
  • Safety-relevant columns (PII, sensitive, critical tags) are always included regardless of SQL references
  • Graceful degradation: SELECT * or zero matches returns all columns
  • Only applies to trino_query; trino_describe_table always shows all columns
  • Configurable: column_context_filtering: true (default)

Curated Query Discovery (#133, #142)

DataHub search results now surface curated query availability. When datasets have saved queries, a curated_query_context block appears with the count — reducing the discovery path from 3 calls to 2.

{
  "curated_query_context": {
    "urn:li:dataset:(...)": {
      "has_curated_queries": true,
      "curated_query_count": 3
    }
  }
}

Agent-Driven Curated Query Creation (#141, #144)

Agents can now contribute reusable SQL queries to DataHub through the knowledge pipeline. The new add_curated_query change type on capture_insight and apply_knowledge lets agents suggest query patterns they discover during sessions, which admins can review and apply to the catalog.

  • query_sql and query_description fields on SuggestedAction and ApplyChange
  • CreateCuratedQuery on DataHubWriter delegates to mcp-datahub v0.8.0's CreateQuery() GraphQL API
  • Apply response includes created_query_urns for reference
  • Upstream dependency bumped: mcp-datahub v0.7.4 → v0.8.0

Enrichment Metrics and Discovery Analysis (#134, #143)

New admin API endpoints for measuring enrichment effectiveness and agent discovery patterns.

  • GET /audit/metrics/enrichment — Enrichment rate, mode breakdown (full/summary/reference/none), token usage and savings estimates
  • GET /audit/metrics/discovery — Session-level analysis: discovery-before-query rate, DataHub/Trino usage breakdown, top discovery tools
  • New enrichment_mode column in audit_logs (migration 000012) tracks which enrichment mode was used per tool call
  • Both endpoints support start_time/end_time query parameters

Housekeeping

  • Removed CHANGELOG.md and docs/support/changelog.md — GitHub Releases is the single source of truth

Migration Notes

  • Database migration 000012 adds enrichment_mode VARCHAR(20) to audit_logs. Runs automatically on startup. Non-breaking (new column with empty default).
  • mcp-datahub v0.8.0 is required for curated query creation. If you pin this dependency separately, update it.
  • New config options (search_schema_preview, schema_preview_max_columns, column_context_filtering) all default to enabled — no config changes required for the new behavior. Set to false to disable.

Closed Issues

  • #132 — Filter column context to only columns referenced in query
  • #133 — Surface curated query availability in DataHub search results
  • #134 — Instrument enrichment metrics for baseline measurement
  • #139 — Anticipatory context in search results to eliminate intermediate calls
  • #141 — Agent-driven curated query creation via knowledge pipeline

Stats

  • 78 files changed, +3,543 / -238
  • 5 features across enrichment, knowledge, and observability

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.25.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.25.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.25.0_linux_amd64.tar.gz