Skip to content

mcp-data-platform-v0.14.0

Choose a tag to compare

@github-actions github-actions released this 08 Feb 04:35
· 384 commits to main since this release
194b586

Problem

When cross-injection is enabled, every Trino tool call receives ~2KB of semantic metadata from DataHub: owners, tags, glossary terms, column descriptions, quality scores, and deprecation warnings. In a typical AI session querying the same table 5-10 times, this repeats 10-20KB of identical metadata, consuming LLM context tokens with no new information.

Solution

The platform now tracks which tables have been enriched per client session. The first call for a table gets full semantic_context + column_context. Repeat calls within the TTL get reduced content based on the configured dedup mode.

Three Modes

Mode Repeat Call Behavior Token Impact
reference (default) Compact JSON noting full metadata was already sent ~100 tokens
summary Table-level context (owners, tags, quality) without column details ~300-500 tokens
none No enrichment appended 0 tokens

reference mode (default) sends:

{
  "metadata_reference": {
    "tables": ["hive.sales.orders"],
    "note": "Full semantic metadata was provided earlier in this session. Refer to previous responses for column descriptions, tags, owners, and glossary terms."
  }
}

summary mode sends:

{
  "semantic_context": {
    "description": "Customer orders with line items",
    "owners": [{"name": "Data Team", "type": "group"}],
    "tags": ["pii", "financial"],
    "quality_score": 0.92
  },
  "note": "Summary only. Full column metadata was provided earlier in this session."
}

none mode returns the raw tool result with no enrichment.

Configuration

Session dedup is enabled by default when trino_semantic_enrichment: true. No configuration changes are needed to benefit from it.

injection:
  trino_semantic_enrichment: true
  session_dedup:
    enabled: true          # Default: true
    mode: reference        # reference (default), summary, none
    entry_ttl: 5m          # Defaults to semantic.cache.ttl
    session_timeout: 30m   # Defaults to server.streamable.session_timeout

Key Behaviors

  • Enabled by default — no config changes needed
  • Trino-only — DataHub and S3 enrichment is not deduplicated
  • Session isolation — each client session has independent dedup state
  • SQL-awareSELECT * FROM orders JOIN products tracks both tables independently
  • In-memory — state is lost on restart (by design, LLM gets fresh metadata)
  • TTL-based — entries expire after entry_ttl (default: semantic cache TTL), triggering fresh enrichment

Audit Response Size Tracking

Audit logs now include two new fields for visibility into tool response sizes:

Field Description
response_chars Total character count across all content items
response_token_estimate Estimated token count (chars / 4)

Requires migration 000003_response_size (runs automatically on startup when database migrations are enabled).


Code Quality

  • Enterprise linter configuration: ~35 linters enabled including revive, staticcheck, gosec, wrapcheck, errorlint, gocritic, gocognit, nestif, and modernize
  • 438 lint findings resolved across two PRs (#61, #62) — zero remaining issues
  • All test lint exclusions removed — tests held to the same standard as production code
  • Dead code removed: unused OAuth PostgreSQL store (pkg/oauth/postgres/) deleted

PRs Included

  • #59 — Audit response size tracking, session metadata dedup, lint hardening
  • #61 — Resolve all 221 golangci-lint findings with enterprise linter config
  • #62 — Remove all test lint exclusions and fix 217 findings
  • #63 — Document session metadata deduplication feature

Changelog

Others

  • 2f32675: Add session metadata deduplication configuration to platform settings (@cjimti)
  • d8f80f3: Audit response size tracking, session metadata dedup, lint hardening (#59) (@cjimti)
  • 3dcabfc: Document session metadata deduplication mechanism, modes, and configuration for Trino enrichment. (@cjimti)
  • b250e54: Expand documentation with session metadata deduplication configuration, modes, and usage details. (@cjimti)
  • 72d033f: Link session deduplication details to documentation overview. (@cjimti)
  • 46f38c0: Remove all test lint exclusions and fix 217 findings (#62) (@cjimti)
  • d5fdd2c: Resolve all 221 golangci-lint findings with enterprise linter config (#61) (@cjimti)
  • 194b586: Revert "Revert session dedup docs (re-applying via PR)" (#63) (@cjimti)
  • 2174f8b: Revert session dedup docs (re-applying via PR) (@cjimti)

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.14.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.14.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.14.0_linux_amd64.tar.gz