An MCP server that lets any MCP client consult a local model served by Ollama.
Part of the pal family: gpal (Gemini), cpal (Claude), ollapal (Ollama). Each wraps its vendor's native API — no compat-layer shims.
- 🔧 Tool use — model autonomously explores the codebase (
list_directory,read_file,search_project, read-onlygit) - 👀 Vision — attach images for multimodal models
- 💭 Thinking —
think=True(or"low"/"medium"/"high") on supporting models - 💬 Stateful sessions — conversation history preserved across calls
- 📦 Model management — list, show, pull (with progress), copy, delete
- 🎯 No hardcoded model names — default resolves from env → loaded models → most-recent-pulled
Requires uv and a running Ollama server.
git clone https://github.com/tobert/ollapal && cd ollapal
uv tool install .claude mcp add ollapal --scope user -- ollapalgemini mcp add ollapal --scope user -- ollapal{
"mcpServers": {
"ollapal": {
"command": "ollapal",
"env": { "OLLAMA_HOST": "http://localhost:11434" }
}
}
}Your MCP client calls these tools based on your prompts:
"Ask my local model to review src/foo.py" →
consult_ollama(query="…", file_paths=["src/foo.py"])"Have gemma pull its weight — review these changes" →
consult_ollama(query="…", model="gemma3:27b")
# Basic — uses the default model (env → loaded → most-recent-pulled)
consult_ollama(query="Review this function for edge cases")
# Pin a specific model
consult_ollama(query="...", model="gemma3:27b")
# Vision
consult_ollama(query="What does this diagram show?", media_paths=["arch.png"])
# Thinking (on supporting models)
consult_ollama(query="Hard problem", think=True)
consult_ollama(query="Harder problem", think="high")
# Options pass-through (temperature, num_ctx, seed, top_p, stop, …)
consult_ollama(query="…", options={"temperature": 0.2, "num_ctx": 32768})
# Stateless (no session)
consult_ollama_oneshot(query="...", model="gemma3:4b")
# Multi-turn conversation
consult_ollama(query="Explain the auth flow", session_id="review-1")
consult_ollama(query="What about edge cases?", session_id="review-1")list_models() # /api/tags
show_model(model="gemma3:27b") # /api/show
list_running_models() # /api/ps
pull_model(model="gemma3:27b") # /api/pull, streams progress
copy_model(source="gemma3:27b", destination="my-gemma")
delete_model(model="my-gemma") # destructiveMCP Client (Claude Code, Gemini CLI, Cursor, …)
│
▼ MCP
┌─────────┐
│ ollapal │ ──▶ Ollama /api/* ──▶ your local model
└─────────┘
│
ollapal gives the model these tools to explore:
• list_directory • read_file • search_project • git (read-only)
If you don't pass model=, ollapal picks one in this order:
OLLAMA_DEFAULT_MODELenv var- First model returned by
/api/ps(currently loaded in memory) - Most-recently-modified model from
/api/tags
No model names are hardcoded — whatever you've pulled is what ollapal sees.
Same pattern as gpal/cpal. Config file at ~/.config/ollapal/config.toml:
system_prompts = ["~/.config/ollapal/LOCAL.md"]
system_prompt = "Respond concisely."
include_default_prompt = trueCLI flags:
ollapal --system-prompt ./PROJECT.md --system-prompt ~/LOCAL.md
ollapal --no-default-prompt --system-prompt ./FULL.md| Variable | Default | Purpose |
|---|---|---|
OLLAMA_HOST |
http://localhost:11434 |
Ollama base URL |
OLLAMA_DEFAULT_MODEL |
— | Pin the default model |
OLLAPAL_MAX_TOOL_CALLS |
100 |
Tool-call budget per consult_ollama turn |
- All file access is sandboxed to the directory where ollapal was started
- Path traversal and symlink attacks are blocked
- Sessions are isolated per
session_id - File size limits: 10MB text, 20MB media
| URI | Description |
|---|---|
resource://server/info |
Server version, capabilities, system prompt provenance |
resource://config/limits |
Safety limits (file sizes, session TTL) |
resource://sessions |
Active sessions |
resource://session/{session_id} |
Single session details |
resource://tools/internal |
Tools the model uses for autonomous exploration |
- Sessions are in-memory — lost on restart, 1h idle TTL.
- Not all local models support tools or vision. Check
show_model(model=…)forcapabilities. - No batch API — Ollama has no native equivalent. For bulk work, loop
consult_ollama_oneshot.
MIT