Skip to content

mcp-data-platform-v0.9.2

Choose a tag to compare

@github-actions github-actions released this 30 Jan 07:05
· 417 commits to main since this release
d68f5fa

Highlights

This release introduces multi-table SQL extraction for semantic enrichment, enabling the platform to provide complete business context when LLMs execute complex queries that span multiple data sources (Elasticsearch, Cassandra, PostgreSQL, etc.).

New Features

Multi-Table Semantic Enrichment

When an LLM executes a Trino query that references multiple tables, the semantic layer now identifies and enriches ALL physical tables in the query:

  • Elasticsearch raw_query support - Extracts indices from TABLE(elasticsearch.system.raw_query(...)) including comma-separated multi-index queries
  • JOIN extraction - Identifies all tables in INNER/LEFT/RIGHT/CROSS JOINs
  • CTE filtering - Automatically excludes Common Table Expressions (WITH clauses) from enrichment since they're not physical tables
  • Deduplication - Ensures each table is enriched only once even if referenced multiple times

Example:

WITH es_response AS (
    SELECT result FROM TABLE(elasticsearch.system.raw_query(
        index => 'sales-2024,sales-2025', query => '{...}'
    ))
),
parsed AS (SELECT * FROM es_response)
SELECT * FROM parsed p
JOIN cassandra.prod.locations loc ON p.id = loc.id

Extracted tables:

  • elasticsearch.default.sales-2024
  • elasticsearch.default.sales-2025
  • cassandra.prod.locations

Filtered (CTEs): es_response, parsed

Enriched Response Format

Query results now include semantic context for multiple tables:

{
  "semantic_context": {
    "description": "Primary table description",
    "owners": ["data-team@example.com"],
    "tags": ["revenue", "pii"],
    "domain": "Sales",
    "quality_score": 0.95
  },
  "column_context": {
    "customer_id": {
      "description": "Unique customer identifier",
      "is_pii": true,
      "glossary_terms": ["Customer ID"]
    }
  },
  "additional_tables": [
    {
      "table": "elasticsearch.default.sales-2025",
      "description": "2025 sales transactions",
      "owners": ["sales-data@example.com"]
    },
    {
      "table": "cassandra.prod.locations",
      "description": "Store location master data",
      "tags": ["master-data"]
    }
  ]
}

Technical Changes

New Files

File Description
pkg/middleware/sqlextract.go SQL table extraction with CTE filtering
pkg/middleware/sqlextract_test.go Comprehensive test coverage

Modified Files

File Changes
pkg/middleware/semantic.go Integration with multi-table extraction
pkg/middleware/semantic_test.go Updated tests for multi-table scenarios

Dependencies

Dependency Version Notes
github.com/xwb1989/sqlparser existing Used for AST-based table extraction
github.com/txn2/mcp-datahub v0.4.4 Semantic metadata provider

Breaking Changes

None. The enrichment format is additive—existing semantic_context structure is preserved, with new additional_tables array added only when multiple tables are detected.

Migration Guide

No migration required. The feature activates automatically when:

  1. EnrichTrinoResults: true in configuration
  2. A semantic provider (DataHub) is configured
  3. SQL queries reference multiple tables

Configuration

No new configuration options. Existing semantic enrichment settings apply:

semantic:
  provider: datahub
  instance: primary

injection:
  trino_semantic_enrichment: true

Testing

# Run extraction tests
go test -v ./pkg/middleware/... -run TestExtractTablesFromSQL

# Run all middleware tests with race detection
go test -race ./pkg/middleware/...

# Full CI suite
go test -race ./...
golangci-lint run ./...
gosec ./...

Known Limitations

  1. Subquery depth - Deeply nested subqueries may not be fully extracted by the regex fallback
  2. Dynamic SQL - SQL constructed at runtime (e.g., in stored procedures) cannot be analyzed
  3. View resolution - Views are treated as tables; underlying table lineage requires DataHub lineage data

What's Next

  • v0.9.3 - Query pattern detection for smarter context selection
  • v0.10.0 - Column-level lineage propagation across JOINs

Changelog

Others

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.9.2

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.9.2_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.9.2_linux_amd64.tar.gz