mcp-data-platform-v0.25.0
What's New
v0.25.0 focuses on reducing agent round-trips and closing the knowledge feedback loop. Search results now carry enough context for agents to write SQL immediately, column enrichment is filtered to what matters, and agents can contribute curated queries back to the catalog.
Schema Preview in Search Results (#139, #145)
DataHub search results now include a bounded column-name+type preview in query_context for available tables. Agents no longer need an intermediate datahub_get_schema or trino_describe_table call before writing SQL — the search result itself has what they need.
- Primary key columns listed first, then remaining columns up to the configured max
total_columnsfield indicates when the preview is truncated- Preview omitted (not empty array) when schema is unavailable — no blocking, no errors
- Configurable:
search_schema_preview: true(default) andschema_preview_max_columns: 15(default)
{
"query_context": {
"urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)": {
"available": true,
"query_table": "hive.sales.orders",
"estimated_rows": 1500000,
"schema_preview": [
{"name": "order_id", "type": "integer"},
{"name": "customer_id", "type": "integer"},
{"name": "order_date", "type": "date"},
{"name": "total_amount", "type": "decimal(10,2)"}
],
"total_columns": 42
}
}
}Column Context Filtering (#132, #140)
Semantic column enrichment from DataHub is now filtered to only columns referenced in the SQL query. A query touching 3 of 70 columns no longer dumps all 70 column descriptions into the response.
- Dialect-agnostic SQL lexer extracts identifiers in a single pass — handles Trino ES
raw_querywith embedded JSON,CROSS JOIN UNNEST, double-quoted identifiers, block/line comments - Safety-relevant columns (PII, sensitive, critical tags) are always included regardless of SQL references
- Graceful degradation:
SELECT *or zero matches returns all columns - Only applies to
trino_query;trino_describe_tablealways shows all columns - Configurable:
column_context_filtering: true(default)
Curated Query Discovery (#133, #142)
DataHub search results now surface curated query availability. When datasets have saved queries, a curated_query_context block appears with the count — reducing the discovery path from 3 calls to 2.
{
"curated_query_context": {
"urn:li:dataset:(...)": {
"has_curated_queries": true,
"curated_query_count": 3
}
}
}Agent-Driven Curated Query Creation (#141, #144)
Agents can now contribute reusable SQL queries to DataHub through the knowledge pipeline. The new add_curated_query change type on capture_insight and apply_knowledge lets agents suggest query patterns they discover during sessions, which admins can review and apply to the catalog.
query_sqlandquery_descriptionfields onSuggestedActionandApplyChangeCreateCuratedQueryonDataHubWriterdelegates tomcp-datahubv0.8.0'sCreateQuery()GraphQL API- Apply response includes
created_query_urnsfor reference - Upstream dependency bumped:
mcp-datahubv0.7.4 → v0.8.0
Enrichment Metrics and Discovery Analysis (#134, #143)
New admin API endpoints for measuring enrichment effectiveness and agent discovery patterns.
GET /audit/metrics/enrichment— Enrichment rate, mode breakdown (full/summary/reference/none), token usage and savings estimatesGET /audit/metrics/discovery— Session-level analysis: discovery-before-query rate, DataHub/Trino usage breakdown, top discovery tools- New
enrichment_modecolumn inaudit_logs(migration 000012) tracks which enrichment mode was used per tool call - Both endpoints support
start_time/end_timequery parameters
Housekeeping
- Removed
CHANGELOG.mdanddocs/support/changelog.md— GitHub Releases is the single source of truth
Migration Notes
- Database migration 000012 adds
enrichment_mode VARCHAR(20)toaudit_logs. Runs automatically on startup. Non-breaking (new column with empty default). mcp-datahubv0.8.0 is required for curated query creation. If you pin this dependency separately, update it.- New config options (
search_schema_preview,schema_preview_max_columns,column_context_filtering) all default to enabled — no config changes required for the new behavior. Set tofalseto disable.
Closed Issues
- #132 — Filter column context to only columns referenced in query
- #133 — Surface curated query availability in DataHub search results
- #134 — Instrument enrichment metrics for baseline measurement
- #139 — Anticipatory context in search results to eliminate intermediate calls
- #141 — Agent-driven curated query creation via knowledge pipeline
Stats
- 78 files changed, +3,543 / -238
- 5 features across enrichment, knowledge, and observability
Installation
Homebrew (macOS)
brew install txn2/tap/mcp-data-platformClaude Code CLI
claude mcp add mcp-data-platform -- mcp-data-platformDocker
docker pull ghcr.io/txn2/mcp-data-platform:v0.25.0Verification
All release artifacts are signed with Cosign. Verify with:
cosign verify-blob --bundle mcp-data-platform_0.25.0_linux_amd64.tar.gz.sigstore.json \
mcp-data-platform_0.25.0_linux_amd64.tar.gz