Skip to content

mcp-data-platform-v0.24.0

Choose a tag to compare

@github-actions github-actions released this 21 Feb 09:23
· 337 commits to main since this release
9331191

Highlights

v0.24.0 delivers significant token savings for semantic enrichment — reducing repeat-query overhead by up to 90% and eliminating bloat from undocumented columns. Enrichment token cost is now tracked through the audit pipeline for observability.

Features

Omit empty column descriptions from enrichment (#130)

Columns with no meaningful catalog metadata are now filtered from enrichment responses. Previously, every column was included even when the catalog had no description, tags, glossary terms, PII flags, or inherited metadata — common with auto-ingested schemas.

  • New ColumnContext.HasContent() method checks for meaningful metadata before inclusion
  • When all columns lack metadata, a column_context_note replaces the empty column_context map
  • Impact: A 50-column table with 5 documented columns drops from ~3,000 to ~500 enrichment tokens

Compact dedup summary mode (#131)

The "summary" dedup mode previously re-sent the full semantic_context (description, owners, tags, glossary terms, custom properties) on repeat queries — just without column details. Now it sends a compact_context containing only critical safety-relevant fields:

Field Included when
urn Always (if set)
domain Always (if set)
deprecation Only when actively deprecated
quality_score Always (if set)
critical_tags Tags matching: pii, sensitive, quality, restricted, confidential

Description, owners, full tag list, glossary terms, and custom properties are omitted since they were already sent in the full enrichment earlier in the session.

Enrichment token tracking (#131)

Enrichment token cost is now estimated and recorded in every audit event:

  • Token estimation uses a len(chars) / 4 heuristic applied before and after enrichment
  • Full enrichment token counts are stored per-table in the session cache
  • Dedup enrichment measures actual tokens sent vs. what full enrichment would have cost
  • Two new fields in audit events: enrichment_tokens_full and enrichment_tokens_dedup
  • Cumulative counters on SessionEnrichmentCache for runtime diagnostics

Database Migration

Migration 000011 adds two columns to audit_logs:

ALTER TABLE audit_logs ADD COLUMN enrichment_tokens_full INTEGER DEFAULT 0;
ALTER TABLE audit_logs ADD COLUMN enrichment_tokens_dedup INTEGER DEFAULT 0;

This migration runs automatically on startup. Existing rows default to 0.

Backward Compatibility

  • Session persistence is fully backward-compatible. Existing persisted sessions using the old map[string]time.Time format are parsed correctly with TokenCount: 0. New sessions persist as SentTableEntry with both sent_at and token_count.
  • The compact_context key replaces semantic_context in dedup summary responses. Clients consuming the "summary" dedup mode should update to look for compact_context instead.
  • The "reference" and "none" dedup modes are unchanged.

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.24.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.24.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.24.0_linux_amd64.tar.gz