mcp-data-platform-v0.14.0
Problem
When cross-injection is enabled, every Trino tool call receives ~2KB of semantic metadata from DataHub: owners, tags, glossary terms, column descriptions, quality scores, and deprecation warnings. In a typical AI session querying the same table 5-10 times, this repeats 10-20KB of identical metadata, consuming LLM context tokens with no new information.
Solution
The platform now tracks which tables have been enriched per client session. The first call for a table gets full semantic_context + column_context. Repeat calls within the TTL get reduced content based on the configured dedup mode.
Three Modes
| Mode | Repeat Call Behavior | Token Impact |
|---|---|---|
reference (default) |
Compact JSON noting full metadata was already sent | ~100 tokens |
summary |
Table-level context (owners, tags, quality) without column details | ~300-500 tokens |
none |
No enrichment appended | 0 tokens |
reference mode (default) sends:
{
"metadata_reference": {
"tables": ["hive.sales.orders"],
"note": "Full semantic metadata was provided earlier in this session. Refer to previous responses for column descriptions, tags, owners, and glossary terms."
}
}summary mode sends:
{
"semantic_context": {
"description": "Customer orders with line items",
"owners": [{"name": "Data Team", "type": "group"}],
"tags": ["pii", "financial"],
"quality_score": 0.92
},
"note": "Summary only. Full column metadata was provided earlier in this session."
}none mode returns the raw tool result with no enrichment.
Configuration
Session dedup is enabled by default when trino_semantic_enrichment: true. No configuration changes are needed to benefit from it.
injection:
trino_semantic_enrichment: true
session_dedup:
enabled: true # Default: true
mode: reference # reference (default), summary, none
entry_ttl: 5m # Defaults to semantic.cache.ttl
session_timeout: 30m # Defaults to server.streamable.session_timeoutKey Behaviors
- Enabled by default — no config changes needed
- Trino-only — DataHub and S3 enrichment is not deduplicated
- Session isolation — each client session has independent dedup state
- SQL-aware —
SELECT * FROM orders JOIN productstracks both tables independently - In-memory — state is lost on restart (by design, LLM gets fresh metadata)
- TTL-based — entries expire after
entry_ttl(default: semantic cache TTL), triggering fresh enrichment
Audit Response Size Tracking
Audit logs now include two new fields for visibility into tool response sizes:
| Field | Description |
|---|---|
response_chars |
Total character count across all content items |
response_token_estimate |
Estimated token count (chars / 4) |
Requires migration 000003_response_size (runs automatically on startup when database migrations are enabled).
Code Quality
- Enterprise linter configuration: ~35 linters enabled including revive, staticcheck, gosec, wrapcheck, errorlint, gocritic, gocognit, nestif, and modernize
- 438 lint findings resolved across two PRs (#61, #62) — zero remaining issues
- All test lint exclusions removed — tests held to the same standard as production code
- Dead code removed: unused OAuth PostgreSQL store (
pkg/oauth/postgres/) deleted
PRs Included
- #59 — Audit response size tracking, session metadata dedup, lint hardening
- #61 — Resolve all 221 golangci-lint findings with enterprise linter config
- #62 — Remove all test lint exclusions and fix 217 findings
- #63 — Document session metadata deduplication feature
Changelog
Others
- 2f32675: Add session metadata deduplication configuration to platform settings (@cjimti)
- d8f80f3: Audit response size tracking, session metadata dedup, lint hardening (#59) (@cjimti)
- 3dcabfc: Document session metadata deduplication mechanism, modes, and configuration for Trino enrichment. (@cjimti)
- b250e54: Expand documentation with session metadata deduplication configuration, modes, and usage details. (@cjimti)
- 72d033f: Link session deduplication details to documentation overview. (@cjimti)
- 46f38c0: Remove all test lint exclusions and fix 217 findings (#62) (@cjimti)
- d5fdd2c: Resolve all 221 golangci-lint findings with enterprise linter config (#61) (@cjimti)
- 194b586: Revert "Revert session dedup docs (re-applying via PR)" (#63) (@cjimti)
- 2174f8b: Revert session dedup docs (re-applying via PR) (@cjimti)
Installation
Homebrew (macOS)
brew install txn2/tap/mcp-data-platformClaude Code CLI
claude mcp add mcp-data-platform -- mcp-data-platformDocker
docker pull ghcr.io/txn2/mcp-data-platform:v0.14.0Verification
All release artifacts are signed with Cosign. Verify with:
cosign verify-blob --bundle mcp-data-platform_0.14.0_linux_amd64.tar.gz.sigstore.json \
mcp-data-platform_0.14.0_linux_amd64.tar.gz