mcp-data-platform-v0.2.0
MCP server for data exploration and analysis with DataHub as the semantic layer.
mcp-data-platform combines DataHub (required), Trino (optional), and S3 (optional) into a unified platform with cross-injection—query results automatically include business context from DataHub.
For the security architecture rationale, see MCP Defense: A Case Study in AI Security.
Features
Data Integration
- DataHub - Search datasets, get entity details, explore lineage, browse glossary terms, list domains and data products
- Trino - Execute SQL queries, explain plans, list catalogs/schemas/tables, describe table schemas
- S3 - List buckets and objects, get/put objects, generate presigned URLs
Cross-Injection
Query results automatically include business context:
- Trino → DataHub - Query results include owners, tags, glossary terms, deprecation warnings, quality scores
- DataHub → Trino - Search results show query availability (can this be queried? how many rows? sample SQL)
- S3 → DataHub - S3 operations include semantic context from DataHub
Authentication
- OIDC - Keycloak, Auth0, Okta, Azure AD support
- API Keys - Service account authentication
- TLS - Configurable TLS for SSE transport
Authorization
- Personas - Role-based tool filtering with wildcard patterns
- Default-Deny - Users without explicit persona have no tool access
- Tool Filtering - Allow/deny patterns per persona
Security Model
Implements fail-closed security:
| Scenario | Behavior |
|---|---|
| Missing token | HTTP 401 Unauthorized |
| Invalid/expired token | Authentication error |
Missing sub or exp claim |
Token rejected |
| No persona resolved | Access denied |
| Default persona | Denies all tools |
Data Protection
- Trino Read-Only Mode - Enforced at query level, blocks INSERT/UPDATE/DELETE/DROP
- Metadata Sanitization - Control characters stripped, strings truncated
- Prompt Injection Detection - Common injection patterns detected and logged
Audit & Operations
- Audit Logging - Track tool calls with user context
- Cryptographically Secure Request IDs - Generated with
crypto/rand
Extensibility
- Go Library - Import as a library to build custom MCP servers
- Multi-Provider - Connect multiple instances of each service
Installation
Homebrew (macOS)
brew install txn2/tap/mcp-data-platformGo Install
go install github.com/txn2/mcp-data-platform/cmd/mcp-data-platform@v0.2.0Docker
docker pull ghcr.io/txn2/mcp-data-platform:v0.2.0Quick Start
stdio Transport (Local)
# platform.yaml
server:
name: mcp-data-platform
transport: stdio
toolkits:
datahub:
primary:
url: https://datahub.example.com
token: ${DATAHUB_TOKEN}
trino:
primary:
host: trino.example.com
port: 443
user: ${TRINO_USER}
password: ${TRINO_PASSWORD}
ssl: true
catalog: hive
schema: default
injection:
trino_semantic_enrichment: true
datahub_query_enrichment: trueSSE Transport (Remote/Shared)
server:
transport: sse
address: ":8443"
tls:
enabled: true
cert_file: /path/to/cert.pem
key_file: /path/to/key.pem
auth:
allow_anonymous: false
oidc:
enabled: true
issuer: "https://keycloak.example.com/realms/platform"
client_id: "mcp-data-platform"
audience: "mcp-data-platform"
personas:
definitions:
analyst:
display_name: "Data Analyst"
roles: ["analyst"]
tools:
allow: ["trino_query", "trino_explain", "datahub_*"]
deny: ["*_delete_*"]
default_persona: analystDocumentation
https://mcp-data-platform.txn2.com
Verification
go test -race ./...- All tests passgolangci-lint run ./...- 0 issuesgosec ./...- 0 security issues- Test coverage: 84.6%
Changelog
Others
Installation
Homebrew (macOS)
brew install txn2/tap/mcp-data-platformClaude Code CLI
claude mcp add mcp-data-platform -- mcp-data-platformDocker
docker pull ghcr.io/txn2/mcp-data-platform:v0.2.0Verification
All release artifacts are signed with Cosign. Verify with:
cosign verify-blob --bundle mcp-data-platform_0.2.0_linux_amd64.tar.gz.sigstore.json \
mcp-data-platform_0.2.0_linux_amd64.tar.gz