Skip to content

vystartasv/toolcall-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

toolcall-proxy

Give any local LLM native OpenAI tool calling. Zero dependencies beyond Python stdlib.

Local models know how to call tools — they just output tool calls as text (<tool_call>, JSON blocks, Python syntax) instead of the native tool_calls array that agent frameworks expect. This proxy sits between your LLM server and your agent, detecting text-format tool calls and converting them on the fly.

Problem

# What your local model outputs:
"<tool_call>\n{\"name\": \"read_file\", \"arguments\": {\"path\": \"/src/main.py\"}}\n</tool_call>"

# What your agent framework needs:
{"tool_calls": [{"function": {"name": "read_file", "arguments": "{\"path\": \"/src/main.py\"}"}}]}

Solution

pip install toolcall-proxy

# Start your LLM server
llama-server -m model.gguf --port 8099

# Start the proxy
toolcall-proxy --port 8098 --backend http://localhost:8099

# Point your agent at http://localhost:8098/v1

Supported formats

Detects and converts 4 tool-call patterns automatically:

Format Example
<tool_call> XML <tool_call>{"name": "read_file", "arguments": {...}}</tool_call>
<function_call> XML <function_call>{"name": "search", "arguments": {...}}</function_call>
JSON code blocks ```json\n{"name": "read_file", "arguments": {...}}\n```
Python-style read_file({"path": "/tmp/data.txt"})

How it works

  1. Receives an OpenAI-format chat completion request from your agent
  2. Forwards it unchanged to your LLM server
  3. Scans the response for text-format tool calls
  4. If found: extracts them, converts to native tool_calls array, strips them from the text content
  5. Returns the modified response to your agent

If the request has no tools parameter, the response passes through unmodified — zero overhead for normal chat.

Example: Hermes Agent with SmolLM3

# ~/.hermes/config.yaml
custom_providers:
  local-agent:
    base_url: http://127.0.0.1:8098/v1
    api_key: no-key-needed
    models:
      - smol-3b

model: smol-3b
provider: custom:local-agent
# Terminal 1: LLM
llama-server -m SmolLM3-3B-Q4_K_M.gguf --port 8099 --n-gpu-layers 99

# Terminal 2: Proxy
toolcall-proxy --port 8098 --backend http://localhost:8099

# Terminal 3: Agent
hermes --provider custom:local-agent --model smol-3b

Verified models

Model Code Quality Agent Readiness Notes
SmolLM3-3B 93.3% 50.0% Best local option. Proxy enables tool calling.
Phi-4-mini 90.0% 16.7% Doesn't output tool calls at all.
Qwen2.5-Coder (7B/14B) 82-85% ~16% MLX quants lack tool instruction following.

Benchmarks: workswithagents.dev/benchmarks

Why not just use a bigger model?

Cloud models (Claude, GPT-4, DeepSeek) handle tool calling natively. This proxy is for when you need local-only inference — privacy, cost, offline, or experimentation with small models.

License

MIT

About

Give any local LLM native OpenAI tool calling. Zero-dependency translation proxy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages