marklogic-mcp

A Model Context Protocol (MCP) server for MarkLogic 12. Enables AI agents to interrogate, query, and manage MarkLogic using MarkLogic-native capabilities — full-text search, Optic row queries, SPARQL, Flux bulk import/export, TDE schema management, and more.

Features

46 MCP tools across 10 domains: admin, documents, search, schema, eval, SPARQL, Optic (incl. vector search), QuickSight, Flux, and approach advisory
5 MCP resources including a machine-readable problem→solution decision guide
13 MCP prompts for query planning, code generation, import design, and BI integration
Two transports: stdio (Claude Desktop / local agents) and HTTP+SSE (remote agents, QuickSight)
Read-only by default — writes gated behind ML_READONLY=false, eval gated behind ML_ALLOW_EVAL=true
Basic and Digest auth for MarkLogic REST API

How Agents Should Use This Server

Start with the decision guide

Before calling any query or import tool, an agent should read the marklogic://instructions resource. It contains a problem→tool decision table and a set of nine principles (e.g. "discover before you query", "native before eval", "Flux before REST for bulk loads"). This prevents common mistakes like using ml_eval_javascript for bulk import or ml_document_put in a loop.

Use the advisory tools when unsure

Two tools exist specifically to guide tool selection:

Advisory tool / resource	When to use
`marklogic://instructions` resource	Read at session start — machine-readable decision guide
`ml_suggest_approach`	Call with a natural-language task to get ranked tool recommendations with ready-to-use recipe parameters
`problem_advisor` prompt	Call with a goal to get a 6-section structured analysis (classification → native approach → discovery → tool sequence → pitfalls → alternatives)
`query_approach_advisor` prompt	Call when the goal is a query and you need to choose between cts.search, Optic, or a hybrid

Discover before you query

Never assume a collection, TDE view, or index exists. The standard discovery sequence is:

ml_collections_list → ml_schema_discover → ml_indexes_list → ml_views_list

Run these before writing any query or import plan.

Optic vs cts.search

Goal	Use	Prerequisite
Find documents by content / keyword	`ml_search` (cts.search)	None — universal index always available
Filter by exact field value or date range	`ml_search` structured_query	Range index recommended (`ml_indexes_list`)
COUNT / SUM / AVG / GROUP BY	`ml_optic_query` (fromView)	TDE view in Schemas DB (`ml_views_list`)
Join two collections by key	`ml_optic_query` (join-inner)	TDE views for both collections
Full-text filter THEN aggregate (hybrid)	`ml_optic_query` (fromSearch)	TDE view + cts query
Count distinct values / faceted nav	`ml_values_query`, `ml_facets_query`	Range or element word index

Use the query_approach_advisor prompt to get a concrete, filled-in query plan for any of these goals.

Multi-model data: Documents + Triples + Vectors

MarkLogic stores all three model types natively. Use data_modeling_advisor for guided design.

Entity-oriented triple pattern (preferred)

Group triples by IRI so that each entity is one document. The document URI equals the entity IRI, and triples are embedded as a sem:triples array inside the document body. This avoids a separate triple store lookup for entity properties and keeps the document and its graph relationships co-located.

Importing raw RDF (two-step)

flux_import with subcommand import-rdf-files → loads triples as managed triples (quad store, one quad per document)
flux_reprocess with an SJS transform that groups quads by subject IRI and writes one entity document per subject → produces the entity-oriented layout

Vector search

Store embeddings as a JSON array field. Define a TDE column with scalar: "vec:vector". Query with ml_vector_search — it uses vec:cosine-similarity through the Optic API with no eval required. MarkLogic 12+ only.

Bulk loading

Always use flux_import for more than ~10 documents. It handles HTTP URL fetch, ZIP/gzip decompression, parallel batching, and automatic TDE view generation in a single call — 10–100× faster than looping ml_document_put.

Quick Start

Claude Desktop (stdio)

Install and build:
```
npm install && npm run build
```

Configure .env:

cp .env.example .env
# Edit with your MarkLogic connection details

Add to Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "marklogic": {
      "command": "node",
      "args": ["/path/to/marklogic-mcp/dist/index.js"],
      "env": {
        "ML_HOST": "your-marklogic-host",
        "ML_PORT": "8000",
        "ML_MANAGEMENT_PORT": "8002",
        "ML_USERNAME": "admin",
        "ML_PASSWORD": "your-password",
        "ML_AUTH_TYPE": "basic",
        "ML_READONLY": "true"
      }
    }
  }
}

Claude Code (remote HTTP transport)

# Start server (Docker)
ML_HOST=<host> ML_PASSWORD=<pass> MCP_API_KEY=<secret> \
  docker compose -f docker-compose.mcp-only.yml up -d

# Register with Claude Code
claude mcp add --transport http marklogic http://localhost:3000/mcp \
  --header "Authorization: Bearer <secret>"

See docs/claude-code-remote-mcp.md for the full guide.

HTTP/SSE Transport (AWS QuickSight / remote agents)

MCP_TRANSPORT=http MCP_HTTP_PORT=3000 ML_HOST=your-host ML_USERNAME=admin ML_PASSWORD=pass \
  node dist/index.js

Health check: GET http://localhost:3000/health

Docker Compose — full stack (MarkLogic + MCP server)

docker compose up
# MarkLogic at http://localhost:8001 (Admin UI)
# MCP server at http://localhost:3000

Configuration

Variable	Default	Description
`MCP_TRANSPORT`	`stdio`	`stdio` or `http`
`MCP_HTTP_PORT`	`3000`	HTTP transport port
`MCP_API_KEY`	(none)	Bearer token for HTTP transport auth
`ML_HOST`	`localhost`	MarkLogic hostname or IP
`ML_PORT`	`8000`	REST API port
`ML_MANAGEMENT_PORT`	`8002`	Management API port
`ML_USERNAME`	`admin`	MarkLogic username
`ML_PASSWORD`	`admin`	MarkLogic password
`ML_DATABASE`	`Documents`	Default database
`ML_AUTH_TYPE`	`digest`	`digest` or `basic`
`ML_SSL`	`false`	Enable HTTPS
`ML_READONLY`	`true`	Block all write operations
`ML_ALLOW_EVAL`	`false`	Enable `/v1/eval` (XQuery/SJS execution)
`LOG_LEVEL`	`info`	`debug`, `info`, `warn`, `error`
`LOG_FORMAT`	`json`	`json` or `pretty`

Tools Reference

Approach Advisory

Tool	Description
`ml_suggest_approach`	Analyse a natural-language task and return ranked tool recommendations with ready-to-use recipe parameters. Call this before starting any non-trivial task.

Admin (8 tools)

Tool	Description
`ml_cluster_status`	Cluster health, version, host info
`ml_databases_list`	List all databases
`ml_database_properties`	Full database configuration
`ml_database_statistics`	Document counts, forest sizes
`ml_forests_list`	Forest status
`ml_servers_list`	App server list
`ml_server_properties`	App server configuration
`ml_reindex_status`	Check whether a database has finished reindexing after TDE installation or index config changes. Returns `ready=true` when safe to run `ml_optic_query` or `ml_tde_validate`. Use after `flux_import` with `generate_tde=true` to avoid SQL-TABLEREINDEXING errors.

Documents (6 tools)

Tool	Description
`ml_document_get`	Retrieve document by URI
`ml_document_list`	List by collection or directory
`ml_document_sample`	Sample random documents from a collection
`ml_document_put` (write)	Create/replace document
`ml_document_delete` (write)	Delete document
`ml_document_patch` (write)	Partial update

Search (5 tools)

Uses MarkLogic's universal index — no TDE or range index required for word queries.

Tool	Description
`ml_search`	Full-text and structured search with cts.search semantics
`ml_search_qbe`	Query By Example — match by document structure
`ml_values_query`	Lexicon/range index value counts and aggregates
`ml_geospatial_search`	Find documents within a geospatial region — circle, bounding box, or polygon. Requires a geospatial element pair index; confirm with `ml_indexes_list` first.
`ml_suggest`	Search autocomplete from a partial query string

Range queries within ml_search require a pre-existing range index. Verify with ml_indexes_list first.

Schema Discovery (6 tools)

Tool	Description
`ml_schema_discover`	Infer field shapes by sampling documents in a collection
`ml_schema_get_tde`	Retrieve TDE templates from the Schemas database
`ml_tde_validate`	Validate a TDE template against sampled documents
`ml_indexes_list`	All configured range, element, and field indexes
`ml_collections_list`	Collections with document counts
`ml_namespaces_list`	XML namespace registry

Optic (3 tools)

Row-based query engine over TDE views. Use for GROUP BY, aggregations, joins, and vector similarity search. Requires a TDE template in the Schemas database — verify with ml_views_list before calling ml_optic_query.

Tool	Description
`ml_optic_query`	Execute a serialised Optic plan (fromView, fromSearch, join, group-by, etc.)
`ml_vector_search`	Find k nearest neighbours via cosine similarity over a TDE `vec:vector` column. MarkLogic 12+, no eval required.
`ml_views_list`	List all available TDE schema.view pairs with the collections they cover

Eval (requires `ML_ALLOW_EVAL=true`)

Use as a last resort — ~10 KB script payload limit, no parallel batching.

Tool	Description
`ml_eval_xquery`	Execute XQuery on the server
`ml_eval_javascript`	Execute Server-Side JavaScript
`ml_invoke_module`	Call a stored SJS/XQuery module

Graphs / SPARQL (3 tools)

Queries MarkLogic's triple store. Supports three storage patterns: embedded triples (co-located inside the source document as a sem:triples array), named graphs (standalone RDF documents), and hybrid (entity document + named graph for cross-entity relationships).

Tool	Description
`ml_sparql_query`	SPARQL 1.1 SELECT/CONSTRUCT/ASK/DESCRIBE. SELECT and ASK return `{ head, results }` JSON. CONSTRUCT and DESCRIBE return raw Turtle text. Supports embedded, named-graph, and hybrid triple patterns.
`ml_graphs_list`	List named graphs. Identifies managed-triple graphs that may be candidates for reprocessing into entity-oriented documents via `flux_reprocess`.
`ml_graph_put` (write)	Load Turtle, N-Triples, JSON-LD, or RDF/XML into a named graph via PUT/PATCH `/v1/graphs`.

QuickSight Integration (4 tools)

Tool	Description
`ml_aggregate_query`	Group-by + metrics → tabular rows for BI consumption
`ml_timeseries_query`	Date-bucketed aggregation (day/week/month/year)
`ml_export_tabular`	Export collection as CSV or JSON rows
`ml_facets_query`	Facet breakdowns for filter controls

Flux (7 tools)

Flux is the preferred path for all bulk data operations. It runs as a subprocess via the MCP server host.

Tool	Description
`flux_import`	Import from CSV, JSON, Parquet, Avro, JDBC, S3, or HTTP URL
`flux_export`	Export documents to file, S3, or JDBC target
`flux_copy`	Copy documents between databases
`flux_reprocess`	Re-run a transform over an existing collection
`flux_preview`	Preview import without writing to the database
`flux_help`	Get Flux subcommand flags and options
`flux_status`	Check Flux runner availability

flux_import supports generate_tde: true to auto-create an Optic view from the imported collection in one call.

Resources Reference

Resource URI	Description
`marklogic://instructions`	Problem-first decision guide — maps goals to native MarkLogic capabilities and tools. Read this at session start.
`marklogic://databases`	Live list of all databases in the cluster
`marklogic://cluster/status`	Cluster health and version
`marklogic://forests`	Forest list with status
`marklogic://documents`	Usage note for document access tools

Prompts Reference

Query Planning

Prompt	Purpose
`query_approach_advisor`	Choose between cts.search, Optic, or a hybrid approach for a query goal. Returns 6-section plan: classification, approach, prerequisites, query construction, performance notes, pitfalls.
`problem_advisor`	Map any natural-language goal to MarkLogic-native tools. Returns 6-section analysis: classification, native approach, discovery sequence, tool sequence, pitfalls, alternatives.
`structured_query_builder`	Natural language → MarkLogic structured query JSON
`optic_query_builder`	Requirements + schema/view → Optic API plan (SJS style)
`sparql_query_builder`	Natural language → SPARQL

Code Generation

Prompt	Purpose
`xquery_function_generator`	Generate XQuery with MarkLogic 12 idioms and namespace handling
`sjs_module_generator`	Generate SJS transforms, REST extensions, or library modules
`tde_schema_generator`	Generate a TDE JSON template from a collection and sample fields
`rest_extension_generator`	Scaffold a MarkLogic REST API extension with HTTP method handlers

Import Design

Prompt	Purpose
`data_import_advisor`	Choose the right import tool and strategy (always considers Flux first)
`gdelt_import`	Ready-to-run `flux_import` call for a GDELT 1.0 event export date

Multi-Model Design

Prompt	Purpose
`data_modeling_advisor`	Design a MarkLogic multi-model schema combining Documents, Triples, and Vectors. Returns 8-section plan: model selection, document design, triple design (entity-oriented pattern + managed-triples reprocess path), vector/embedding design, TDE schema, import sequence, query plan, pitfalls.

QuickSight

Prompt	Purpose
`quicksight_dataset_designer`	Design a QuickSight dataset sourced from MarkLogic — discovery, field mapping, aggregation strategy
`quicksight_dashboard_planner`	Plan a QuickSight dashboard from a business question

Architecture

src/
  server.ts          — factory: createMcpServer() wires tools + resources + prompts
  index.ts           — CLI entry; selects stdio or HTTP transport
  tools/             — one file per domain; registerXxxTools() functions
  resources/         — static + dynamic resources; INSTRUCTIONS_TEXT decision guide
  prompts/           — all prompts; query_approach_advisor and problem_advisor first
  client/            — typed HTTP clients for each MarkLogic API surface
  config/            — dotenv loading and Zod validation
  transport/         — stdio and Express/HTTP transport wrappers
  utils/             — error formatting, digest auth, multipart builder

All write tools check readonly at registration time and are not registered when ML_READONLY=true. Eval tools check allowEval and are not registered when ML_ALLOW_EVAL=false. This means tools are absent from the MCP tool list entirely — they are never silently no-ops.

Development

npm run dev          # tsx watch — auto-reload on save
npm run build        # TypeScript → dist/
npm run typecheck    # Type check without emitting
npm test             # Vitest (skips gracefully if ML_HOST not set)
npm run inspector    # Launch MCP Inspector UI

AWS QuickSight Integration

QuickSight agents connect via the HTTP transport. Recommended pattern:

Start the MCP server in HTTP mode (ECS task or EC2 accessible from QuickSight)
Agent calls ml_schema_discover and ml_views_list to understand data shape
Agent calls ml_export_tabular or ml_aggregate_query to extract data rows
Agent uses the QuickSight API to create/refresh a SPICE dataset
Use quicksight_dataset_designer prompt for guided step-by-step assistance

Security Notes

ML_READONLY=true (default) — write tools (ml_document_put, ml_document_delete, ml_document_patch) are not registered at all
ML_ALLOW_EVAL=false (default) — eval tools (ml_eval_javascript, ml_eval_xquery, ml_invoke_module) are not registered
MCP_API_KEY — set to require Bearer token auth on the HTTP transport
Credentials are read from environment variables only — never hardcoded
Digest auth recomputes the challenge per request — no credential caching
The Flux runner executes on the MCP server host; http_url must be reachable from that host, not from the user's machine

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.claude		.claude
.github/workflows		.github/workflows
docs		docs
flux-runner		flux-runner
scripts		scripts
src		src
test-data		test-data
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.mcp-only.yml		docker-compose.mcp-only.yml
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

marklogic-mcp

Features

How Agents Should Use This Server

Start with the decision guide

Use the advisory tools when unsure

Discover before you query

Optic vs cts.search

Multi-model data: Documents + Triples + Vectors

Bulk loading

Quick Start

Claude Desktop (stdio)

Claude Code (remote HTTP transport)

HTTP/SSE Transport (AWS QuickSight / remote agents)

Docker Compose — full stack (MarkLogic + MCP server)

Configuration

Tools Reference

Approach Advisory

Admin (8 tools)

Documents (6 tools)

Search (5 tools)

Schema Discovery (6 tools)

Optic (3 tools)

Eval (requires ML_ALLOW_EVAL=true)

Graphs / SPARQL (3 tools)

QuickSight Integration (4 tools)

Flux (7 tools)

Resources Reference

Prompts Reference

Query Planning

Code Generation

Import Design

Multi-Model Design

QuickSight

Architecture

Development

AWS QuickSight Integration

Security Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Eval (requires `ML_ALLOW_EVAL=true`)

Packages