Give any local LLM native OpenAI tool calling. Zero dependencies beyond Python stdlib.
Local models know how to call tools — they just output tool calls as text (<tool_call>, JSON blocks, Python syntax) instead of the native tool_calls array that agent frameworks expect. This proxy sits between your LLM server and your agent, detecting text-format tool calls and converting them on the fly.
# What your local model outputs:
"<tool_call>\n{\"name\": \"read_file\", \"arguments\": {\"path\": \"/src/main.py\"}}\n</tool_call>"
# What your agent framework needs:
{"tool_calls": [{"function": {"name": "read_file", "arguments": "{\"path\": \"/src/main.py\"}"}}]}pip install toolcall-proxy
# Start your LLM server
llama-server -m model.gguf --port 8099
# Start the proxy
toolcall-proxy --port 8098 --backend http://localhost:8099
# Point your agent at http://localhost:8098/v1Detects and converts 4 tool-call patterns automatically:
| Format | Example |
|---|---|
<tool_call> XML |
<tool_call>{"name": "read_file", "arguments": {...}}</tool_call> |
<function_call> XML |
<function_call>{"name": "search", "arguments": {...}}</function_call> |
| JSON code blocks | ```json\n{"name": "read_file", "arguments": {...}}\n``` |
| Python-style | read_file({"path": "/tmp/data.txt"}) |
- Receives an OpenAI-format chat completion request from your agent
- Forwards it unchanged to your LLM server
- Scans the response for text-format tool calls
- If found: extracts them, converts to native
tool_callsarray, strips them from the text content - Returns the modified response to your agent
If the request has no tools parameter, the response passes through unmodified — zero overhead for normal chat.
# ~/.hermes/config.yaml
custom_providers:
local-agent:
base_url: http://127.0.0.1:8098/v1
api_key: no-key-needed
models:
- smol-3b
model: smol-3b
provider: custom:local-agent# Terminal 1: LLM
llama-server -m SmolLM3-3B-Q4_K_M.gguf --port 8099 --n-gpu-layers 99
# Terminal 2: Proxy
toolcall-proxy --port 8098 --backend http://localhost:8099
# Terminal 3: Agent
hermes --provider custom:local-agent --model smol-3b| Model | Code Quality | Agent Readiness | Notes |
|---|---|---|---|
| SmolLM3-3B | 93.3% | 50.0% | Best local option. Proxy enables tool calling. |
| Phi-4-mini | 90.0% | 16.7% | Doesn't output tool calls at all. |
| Qwen2.5-Coder (7B/14B) | 82-85% | ~16% | MLX quants lack tool instruction following. |
Benchmarks: workswithagents.dev/benchmarks
Cloud models (Claude, GPT-4, DeepSeek) handle tool calling natively. This proxy is for when you need local-only inference — privacy, cost, offline, or experimentation with small models.
MIT