mcp-data-platform-v0.9.2
Highlights
This release introduces multi-table SQL extraction for semantic enrichment, enabling the platform to provide complete business context when LLMs execute complex queries that span multiple data sources (Elasticsearch, Cassandra, PostgreSQL, etc.).
New Features
Multi-Table Semantic Enrichment
When an LLM executes a Trino query that references multiple tables, the semantic layer now identifies and enriches ALL physical tables in the query:
- Elasticsearch raw_query support - Extracts indices from
TABLE(elasticsearch.system.raw_query(...))including comma-separated multi-index queries - JOIN extraction - Identifies all tables in INNER/LEFT/RIGHT/CROSS JOINs
- CTE filtering - Automatically excludes Common Table Expressions (WITH clauses) from enrichment since they're not physical tables
- Deduplication - Ensures each table is enriched only once even if referenced multiple times
Example:
WITH es_response AS (
SELECT result FROM TABLE(elasticsearch.system.raw_query(
index => 'sales-2024,sales-2025', query => '{...}'
))
),
parsed AS (SELECT * FROM es_response)
SELECT * FROM parsed p
JOIN cassandra.prod.locations loc ON p.id = loc.idExtracted tables:
elasticsearch.default.sales-2024elasticsearch.default.sales-2025cassandra.prod.locations
Filtered (CTEs): es_response, parsed
Enriched Response Format
Query results now include semantic context for multiple tables:
{
"semantic_context": {
"description": "Primary table description",
"owners": ["data-team@example.com"],
"tags": ["revenue", "pii"],
"domain": "Sales",
"quality_score": 0.95
},
"column_context": {
"customer_id": {
"description": "Unique customer identifier",
"is_pii": true,
"glossary_terms": ["Customer ID"]
}
},
"additional_tables": [
{
"table": "elasticsearch.default.sales-2025",
"description": "2025 sales transactions",
"owners": ["sales-data@example.com"]
},
{
"table": "cassandra.prod.locations",
"description": "Store location master data",
"tags": ["master-data"]
}
]
}Technical Changes
New Files
| File | Description |
|---|---|
pkg/middleware/sqlextract.go |
SQL table extraction with CTE filtering |
pkg/middleware/sqlextract_test.go |
Comprehensive test coverage |
Modified Files
| File | Changes |
|---|---|
pkg/middleware/semantic.go |
Integration with multi-table extraction |
pkg/middleware/semantic_test.go |
Updated tests for multi-table scenarios |
Dependencies
| Dependency | Version | Notes |
|---|---|---|
github.com/xwb1989/sqlparser |
existing | Used for AST-based table extraction |
github.com/txn2/mcp-datahub |
v0.4.4 | Semantic metadata provider |
Breaking Changes
None. The enrichment format is additive—existing semantic_context structure is preserved, with new additional_tables array added only when multiple tables are detected.
Migration Guide
No migration required. The feature activates automatically when:
EnrichTrinoResults: truein configuration- A semantic provider (DataHub) is configured
- SQL queries reference multiple tables
Configuration
No new configuration options. Existing semantic enrichment settings apply:
semantic:
provider: datahub
instance: primary
injection:
trino_semantic_enrichment: trueTesting
# Run extraction tests
go test -v ./pkg/middleware/... -run TestExtractTablesFromSQL
# Run all middleware tests with race detection
go test -race ./pkg/middleware/...
# Full CI suite
go test -race ./...
golangci-lint run ./...
gosec ./...Known Limitations
- Subquery depth - Deeply nested subqueries may not be fully extracted by the regex fallback
- Dynamic SQL - SQL constructed at runtime (e.g., in stored procedures) cannot be analyzed
- View resolution - Views are treated as tables; underlying table lineage requires DataHub lineage data
What's Next
- v0.9.3 - Query pattern detection for smarter context selection
- v0.10.0 - Column-level lineage propagation across JOINs
Changelog
Others
Installation
Homebrew (macOS)
brew install txn2/tap/mcp-data-platformClaude Code CLI
claude mcp add mcp-data-platform -- mcp-data-platformDocker
docker pull ghcr.io/txn2/mcp-data-platform:v0.9.2Verification
All release artifacts are signed with Cosign. Verify with:
cosign verify-blob --bundle mcp-data-platform_0.9.2_linux_amd64.tar.gz.sigstore.json \
mcp-data-platform_0.9.2_linux_amd64.tar.gz