Skip to content

mcp-data-platform-v0.2.0

Choose a tag to compare

@github-actions github-actions released this 22 Jan 03:13
· 441 commits to main since this release
c70f701

MCP server for data exploration and analysis with DataHub as the semantic layer.

mcp-data-platform combines DataHub (required), Trino (optional), and S3 (optional) into a unified platform with cross-injection—query results automatically include business context from DataHub.

For the security architecture rationale, see MCP Defense: A Case Study in AI Security.

Features

Data Integration

  • DataHub - Search datasets, get entity details, explore lineage, browse glossary terms, list domains and data products
  • Trino - Execute SQL queries, explain plans, list catalogs/schemas/tables, describe table schemas
  • S3 - List buckets and objects, get/put objects, generate presigned URLs

Cross-Injection

Query results automatically include business context:

  • Trino → DataHub - Query results include owners, tags, glossary terms, deprecation warnings, quality scores
  • DataHub → Trino - Search results show query availability (can this be queried? how many rows? sample SQL)
  • S3 → DataHub - S3 operations include semantic context from DataHub

Authentication

  • OIDC - Keycloak, Auth0, Okta, Azure AD support
  • API Keys - Service account authentication
  • TLS - Configurable TLS for SSE transport

Authorization

  • Personas - Role-based tool filtering with wildcard patterns
  • Default-Deny - Users without explicit persona have no tool access
  • Tool Filtering - Allow/deny patterns per persona

Security Model

Implements fail-closed security:

Scenario Behavior
Missing token HTTP 401 Unauthorized
Invalid/expired token Authentication error
Missing sub or exp claim Token rejected
No persona resolved Access denied
Default persona Denies all tools

Data Protection

  • Trino Read-Only Mode - Enforced at query level, blocks INSERT/UPDATE/DELETE/DROP
  • Metadata Sanitization - Control characters stripped, strings truncated
  • Prompt Injection Detection - Common injection patterns detected and logged

Audit & Operations

  • Audit Logging - Track tool calls with user context
  • Cryptographically Secure Request IDs - Generated with crypto/rand

Extensibility

  • Go Library - Import as a library to build custom MCP servers
  • Multi-Provider - Connect multiple instances of each service

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Go Install

go install github.com/txn2/mcp-data-platform/cmd/mcp-data-platform@v0.2.0

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.2.0

Quick Start

stdio Transport (Local)

# platform.yaml
server:
  name: mcp-data-platform
  transport: stdio

toolkits:
  datahub:
    primary:
      url: https://datahub.example.com
      token: ${DATAHUB_TOKEN}

  trino:
    primary:
      host: trino.example.com
      port: 443
      user: ${TRINO_USER}
      password: ${TRINO_PASSWORD}
      ssl: true
      catalog: hive
      schema: default

injection:
  trino_semantic_enrichment: true
  datahub_query_enrichment: true

SSE Transport (Remote/Shared)

server:
  transport: sse
  address: ":8443"
  tls:
    enabled: true
    cert_file: /path/to/cert.pem
    key_file: /path/to/key.pem

auth:
  allow_anonymous: false
  oidc:
    enabled: true
    issuer: "https://keycloak.example.com/realms/platform"
    client_id: "mcp-data-platform"
    audience: "mcp-data-platform"

personas:
  definitions:
    analyst:
      display_name: "Data Analyst"
      roles: ["analyst"]
      tools:
        allow: ["trino_query", "trino_explain", "datahub_*"]
        deny: ["*_delete_*"]
  default_persona: analyst

Documentation

https://mcp-data-platform.txn2.com


Verification

  • go test -race ./... - All tests pass
  • golangci-lint run ./... - 0 issues
  • gosec ./... - 0 security issues
  • Test coverage: 84.6%

Changelog

Others

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.2.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.2.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.2.0_linux_amd64.tar.gz