toolcall-proxy

Give any local LLM native OpenAI tool calling. Zero dependencies beyond Python stdlib.

Local models know how to call tools — they just output tool calls as text (<tool_call>, JSON blocks, Python syntax) instead of the native tool_calls array that agent frameworks expect. This proxy sits between your LLM server and your agent, detecting text-format tool calls and converting them on the fly.

Problem

# What your local model outputs:
"<tool_call>\n{\"name\": \"read_file\", \"arguments\": {\"path\": \"/src/main.py\"}}\n</tool_call>"

# What your agent framework needs:
{"tool_calls": [{"function": {"name": "read_file", "arguments": "{\"path\": \"/src/main.py\"}"}}]}

Solution

pip install toolcall-proxy

# Start your LLM server
llama-server -m model.gguf --port 8099

# Start the proxy
toolcall-proxy --port 8098 --backend http://localhost:8099

# Point your agent at http://localhost:8098/v1

Supported formats

Detects and converts 4 tool-call patterns automatically:

Format	Example
`<tool_call>` XML	`<tool_call>{"name": "read_file", "arguments": {...}}</tool_call>`
`<function_call>` XML	`<function_call>{"name": "search", "arguments": {...}}</function_call>`
JSON code blocks	```json\n{"name": "read_file", "arguments": {...}}\n```
Python-style	`read_file({"path": "/tmp/data.txt"})`

How it works

Receives an OpenAI-format chat completion request from your agent
Forwards it unchanged to your LLM server
Scans the response for text-format tool calls
If found: extracts them, converts to native tool_calls array, strips them from the text content
Returns the modified response to your agent

If the request has no tools parameter, the response passes through unmodified — zero overhead for normal chat.

Example: Hermes Agent with SmolLM3

# ~/.hermes/config.yaml
custom_providers:
  local-agent:
    base_url: http://127.0.0.1:8098/v1
    api_key: no-key-needed
    models:
      - smol-3b

model: smol-3b
provider: custom:local-agent

# Terminal 1: LLM
llama-server -m SmolLM3-3B-Q4_K_M.gguf --port 8099 --n-gpu-layers 99

# Terminal 2: Proxy
toolcall-proxy --port 8098 --backend http://localhost:8099

# Terminal 3: Agent
hermes --provider custom:local-agent --model smol-3b

Verified models

Model	Code Quality	Agent Readiness	Notes
SmolLM3-3B	93.3%	50.0%	Best local option. Proxy enables tool calling.
Phi-4-mini	90.0%	16.7%	Doesn't output tool calls at all.
Qwen2.5-Coder (7B/14B)	82-85%	~16%	MLX quants lack tool instruction following.

Benchmarks: workswithagents.dev/benchmarks

Why not just use a bigger model?

Cloud models (Claude, GPT-4, DeepSeek) handle tool calling natively. This proxy is for when you need local-only inference — privacy, cost, offline, or experimentation with small models.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/toolcall_proxy		src/toolcall_proxy
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

toolcall-proxy

Problem

Solution

Supported formats

How it works

Example: Hermes Agent with SmolLM3

Verified models

Why not just use a bigger model?

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

toolcall-proxy

Problem

Solution

Supported formats

How it works

Example: Hermes Agent with SmolLM3

Verified models

Why not just use a bigger model?

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages