A Unix-philosophy agentic CLI for Large Language Models. Inspired by simonw/llm, built for composability --- stdin/stdout pipelines, subprocess-based tool and provider extensibility (llm-tool-*, llm-provider-*), and multi-target output (native CLI, WASM, Python).
Scope. llm-rs is a library (with a CLI), not an orchestration framework. Hierarchical workflows compose via specialist tools — small llm-tool-* executables that may internally invoke llm prompt with a narrow agent. See doc/spec/external-tools.md for the protocol, doc/cookbook/specialist-tools.md for a worked example, and doc/research/specialist-tools-vs-sub-agents.md for why llm-rs is not building recursive sub-agent delegation.
# Send a prompt (streams to stdout)
echo "Hello" | llm
# Positional text works too
llm "Explain monads in one sentence" -m claude-sonnet-4-6
# Specify model and system prompt
llm "What is 2+2?" -m gpt-4o -s "Answer only with the number"
# Use Anthropic models
llm "Hello" -m claude-sonnet-4-6
# Disable streaming
llm "Hello" --no-stream
# Show token usage on stderr
llm "Hello" -u
# Skip logging this prompt
llm "Hello" -nBuilt-in tools let the model call functions during a conversation. The CLI manages the chain loop automatically --- it sends tool calls to the executor, feeds results back, and repeats until the model responds with text.
# Enable a built-in tool
llm "What time is it?" -T llm_time
# Multiple tools
llm "What version are you and what time is it?" -T llm_version -T llm_time
# Limit chain iterations (default: 5 for prompt/chat, 10 for agents)
llm "Do something" -T llm_version --chain-limit 3
# Debug mode: show tool calls/results on stderr
llm "What version?" -T llm_version --tools-debug
# Verbose mode: see chain loop iterations (-v summary, -vv full messages)
llm "What time is it?" -T llm_time --verbose
llm "What time is it?" -T llm_time -vv
# List available built-in tools
llm tools listAvailable built-in tools:
llm_version--- returns the CLI versionllm_time--- returns current UTC and local time with timezone
When using tools, the -v/--verbose flag reveals what happens inside the chain loop --- which iteration you're on, what messages are being sent, per-iteration token usage, and tool call/result details.
# Level 1 (-v): iteration summary + tool debug
llm "What time is it?" -T llm_time -v
# stderr output:
# [chain] Iteration 1/5 | 1 message [user]
# [chain] Iteration 1 complete | usage: 10 input, 5 output | 1 tool call(s)
# Tool call: llm_time (id: call_1)
# Arguments: {}
# Tool result: {"utc_time":"...","local_time":"...","timezone":"..."}
# [chain] Iteration 2/5 | 3 messages [user, assistant+tools(1), tool(1)]
# [chain] Iteration 2 complete | usage: 20 input, 10 output | 0 tool call(s)
# Level 2 (-vv): also dumps full message JSON per iteration
llm "What time is it?" -T llm_time -vv
# stderr additionally includes:
# [chain] Messages:
# [
# {"role": "user", "content": "What time is it?"}
# ]--verbose implies --tools-debug --- no need for both flags. Works on both prompt and chat commands.
Any executable on $PATH named llm-tool-* is automatically discovered and usable with -T. External tools can be written in any language.
# List all tools (built-in + external)
llm tools list
# Use an external tool
llm "Make this loud: hello" -T upper -m gpt-4o
# Mix built-in and external tools
llm "What time is it, and shout it" -T llm_time -T shoutWriting an external tool requires two things:
-
Schema: respond to
--schemawith JSON describing the tool:$ llm-tool-upper --schema {"name":"upper","description":"Uppercase text","input_schema":{"type":"object","properties":{"text":{"type":"string"}},"required":["text"]}} -
Execution: read arguments JSON from stdin, write result to stdout:
$ echo '{"text":"hello"}' | llm-tool-upper HELLO
Exit 0 means success (stdout = output). Non-zero means error (stderr = error message). Default timeout: 30 seconds.
Specialist tools. An external tool can itself call llm prompt internally with a cheaper model and a narrow system prompt — an opaque "specialist" function from the parent LLM's perspective. This is llm-rs's answer to hierarchical workflows, in place of a recursive sub-agent runtime. Worked example: doc/cookbook/specialist-tools.md. Rationale: doc/research/specialist-tools-vs-sub-agents.md.
Any executable on $PATH named llm-provider-* extends llm-rs with new model providers. External providers can serve models from Ollama, llama.cpp, or any custom backend.
# Models from external providers appear alongside built-in ones
llm models list
# Use a model from an external provider
llm "Hello" -m llama3
# See all providers and tools
llm plugins listWriting an external provider requires metadata flags and a JSON stdin/stdout protocol:
--id--- print the provider name (e.g.ollama)--models--- print JSON array of model metadata--needs-key--- print{"needed":false}or{"needed":true,"env_var":"MY_KEY"}
On invocation, the provider reads a JSON request from stdin and writes either streaming JSONL lines or a single JSON response to stdout. See doc/implementation.md for the full protocol specification.
Continue previous conversations, use multi-turn message input, and chat interactively.
# Continue the most recent conversation
llm -c "And what about 3+3?"
# Continue a specific conversation by ID
llm --cid 01j5a... "Follow up question"
# Load messages from a JSON file
llm --messages conversation.json "What next?"
# Load messages from stdin
echo '[{"role":"user","content":"hi"},{"role":"assistant","content":"hello!"}]' | llm --messages - "Follow up"
# Get JSON output instead of streaming text
llm --json "What is 2+2?"
# Combine: messages input with JSON output
llm --messages history.json --json "Summarize"# Start an interactive chat session
llm chat
# Chat with a specific model and system prompt
llm chat -m claude-sonnet-4-6 -s "You are a helpful assistant"
# Chat with tools enabled
llm chat -T llm_time -T llm_version
# Chat with verbose tool chain output
llm chat -T llm_time -vWhen the model requests multiple tool calls in a single turn, llm-rs dispatches them concurrently by default. Results are returned in the same order the model asked for them.
# Default: parallel dispatch, unlimited concurrency within a single iteration
llm "Check version and time" -T llm_version -T llm_time
# Cap concurrency
llm "Run N tools" -T tool_a -T tool_b --max-parallel-tools 2
# Force sequential dispatch (e.g. to inspect tools one at a time)
llm "Run N tools" -T tool_a -T tool_b --sequential-tools--tools-approve forces sequential dispatch automatically so approval prompts don't interleave on stdin. Flags apply to prompt, chat, and agent run. Agents can set parallel_tools / max_parallel_tools in TOML; CLI flags override.
Agents are TOML files that bundle a system prompt, model, tools, chain limit, options, budget, retry, and parallel-tool config. Global agents live in ~/.config/llm/agents/; project-local agents in ./.llm/agents/ (local shadows global).
llm agent init researcher # Scaffold a local agent template
llm agent init planner --global # Scaffold a global agent
llm agent list # List discovered agents (name, model, source)
llm agent show researcher # Print resolved agent config
llm agent path # Print global and local agent directory paths
# Run an agent
llm agent run researcher "summarize recent changes"
echo "some input" | llm agent run researcher
# CLI flags override agent TOML
llm agent run researcher "hi" -m claude-sonnet-4-6 --chain-limit 3 -v
# Dry-run: resolve model, provider, tools, options, budget, retry, and parallel config without calling the LLM
llm agent run researcher "hi" --dry-run
llm agent run researcher "hi" --dry-run --json
llm agent run researcher "hi" --dry-run -vv # also includes the serialized Prompt payloadExample ~/.config/llm/agents/researcher.toml:
model = "claude-sonnet-4-6"
system_prompt = "You are a careful research assistant."
tools = ["llm_time", "llm_version"]
chain_limit = 10
parallel_tools = true
max_parallel_tools = 4
[options]
temperature = 0.2
[budget]
max_tokens = 50000
[retry]
max_retries = 3
base_delay_ms = 1000Token usage accumulates across chain iterations. Pass -u to print cumulative totals; set [budget] max_tokens in an agent file to stop the chain when the total exceeds the cap. The chain finishes the current turn, emits a [budget] warning, and returns the partial result.
# Show cumulative usage across all chain iterations
llm "Plan a trip" -T llm_time -u
# llm chat prints a session-wide usage summary on exit
llm chat -uTransient HTTP errors (429, 5xx) are retried with exponential backoff and jitter before any response bytes are streamed. Configure per-invocation with --retries or per-agent via [retry].
llm "Hello" --retries 5
llm chat --retries 3
llm agent run researcher "hi" --retries 5 # overrides agent TOMLSet persistent per-model options and model-name aliases in config.toml. CLI -o flags override config defaults per invocation.
# Options
llm options set gpt-4o temperature 0.7
llm options set gpt-4o max_tokens 1000
llm options get gpt-4o
llm options list
llm options clear gpt-4o temperature
# Aliases
llm aliases set fast gpt-4o-mini
llm aliases set claude claude-sonnet-4-6
llm aliases list
llm aliases show fast
llm aliases remove fast
llm aliases path
llm "Hello" -m claude # Uses the alias
llm "Hello" -o temperature 0.9 # Overrides config defaultForce the model to return JSON conforming to a schema. Works with both OpenAI (native response_format) and Anthropic (transparent tool wrapping).
# Schema DSL: simple field definitions
llm "Extract: John is 30" --schema "name str, age int"
# With field descriptions
llm "Extract: John is 30" --schema "name str:The person's name, age int:Their age"
# JSON Schema literal
llm "Extract name" --schema '{"type":"object","properties":{"name":{"type":"string"}},"required":["name"]}'
# Schema from a file
llm "Extract data" --schema schema.json
# Multiple items: wrap in array
llm "List the planets" --schema "name str, diameter_km int" --schema-multi
# Preview DSL output
llm schemas dsl "name str, age int"Schema DSL types: str (default), int, float, bool.
llm keys set openai # Prompted for key (hidden input)
llm keys set anthropic # Set Anthropic API key
llm keys get openai # Print stored key
llm keys list # List all stored key names
llm keys path # Print path to keys.tomlKeys are resolved in order: --key flag, keys.toml, environment variable (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY).
llm models list # List available models (OpenAI + Anthropic)
llm models default # Show current default model
llm models default gpt-4o # Set default modelAvailable models:
- OpenAI:
gpt-4o,gpt-4o-mini - Anthropic:
claude-opus-4-6,claude-sonnet-4-6,claude-haiku-4-5
Every prompt is logged to a JSONL file (one per conversation). Logs are plain text --- inspect them with cat, grep, jq.
llm logs list # List recent conversations
llm logs list --json # JSON output (pipe to jq)
llm logs list -r # Print the most recent response text
llm logs list -m gpt-4o # Filter by model
llm logs list -q "rust" # Full-text search
llm logs list -u # Show token usage
llm logs path # Print logs directory path
llm logs status # Show logging on/off state
llm logs on # Enable logging
llm logs off # Disable loggingLog files live at ~/.local/share/llm/logs/. Each file is a JSONL conversation:
{"type":"conversation","v":1,"id":"01j5a...","model":"gpt-4o","name":"Hello","created":"2026-04-03T12:00:00Z"}
{"type":"response","id":"01j5b...","model":"gpt-4o","prompt":"Hello","response":"Hi!","usage":{"input":5,"output":3},"duration_ms":230,...}llm schemas dsl "name str, age int" # Preview DSL -> JSON Schema
llm schemas list # List schemas used in logs
llm schemas show <id> # Show schema by IDllm plugins list # Show all providers (compiled + external) and external toolsExample output:
Compiled providers:
openai (2 models: gpt-4o, gpt-4o-mini)
anthropic (3 models: claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5)
External providers:
ollama (/usr/local/bin/llm-provider-ollama) (3 models: llama3, mistral, phi3)
External tools:
web_search (/usr/local/bin/llm-tool-web-search) — Search the web
upper (/usr/local/bin/llm-tool-upper) — Uppercase text
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Runtime error (I/O failure, storage error) |
| 2 | Configuration error (missing key, unknown model, bad config) |
| 3 | Provider error (API failure, network timeout) |
In addition to the CLI, llm-rs can be used as a library from JavaScript/TypeScript (via WASM) or Python (via native module). Both support OpenAI and Anthropic.
import init, { LlmClient } from '@llm-rs/wasm';
await init();
// Auto-detects provider from model name
const openai = new LlmClient('sk-...', 'gpt-4o');
const claude = new LlmClient('sk-ant-...', 'claude-sonnet-4-6');
// Or use explicit constructors
const client = LlmClient.newAnthropic('sk-ant-...', 'claude-sonnet-4-6');
const custom = LlmClient.newAnthropicWithBaseUrl('sk-ant-...', 'claude-sonnet-4-6', 'https://my-proxy.example.com');
// Non-streaming
const response = await client.prompt('Hello');
// With system prompt
const answer = await client.promptWithSystem('What is 2+2?', 'Answer only with the number');
// Streaming (callback per chunk)
await client.promptStreaming('Tell me a story', (chunk) => {
process.stdout.write(chunk);
});
// With options
const result = await client.promptWithOptions(
'Hello',
null, // system prompt (optional)
'{"temperature": 0.7, "max_tokens": 1000}'
);Build from source:
wasm-pack build crates/llm-wasm --target web # ES module for browsers
wasm-pack build crates/llm-wasm --target bundler # For webpack/rollup (Obsidian plugins)The WASM module is stateless --- no config files, no log storage. HTTP goes through the browser's fetch() API. The host application manages API keys and persistence.
import llm_rs
# Auto-detects provider from model name
client = llm_rs.LlmClient("sk-...", "gpt-4o-mini")
claude = llm_rs.LlmClient("sk-ant-...", "claude-sonnet-4-6")
# Or specify provider explicitly
client = llm_rs.LlmClient("sk-ant-...", "claude-sonnet-4-6", provider="anthropic")
# Non-streaming
response = client.prompt("Hello, world!")
print(response)
# With system prompt
answer = client.prompt("What is 2+2?", system="Answer only with the number")
# Streaming (Python iterator)
for chunk in client.prompt_stream("Tell me a story"):
print(chunk, end="", flush=True)Build from source (requires uv):
cd crates/llm-python
uv venv && uv pip install maturin
uv run maturin develop # Install to current venv
uv run maturin build --release # Build wheel for distributionOptional parameters: provider ("openai" or "anthropic"), base_url for custom API endpoints, log_dir to enable JSONL logging.
Requires Rust 1.85+ (2024 edition).
git clone https://github.com/user/llm-rs
cd llm-rs
cargo install --path crates/llm-cliOr build from the workspace:
cargo build --release -p llm-cli
# Binary is at target/release/llmConfig files live in XDG-standard directories:
~/.config/llm/config.toml # Main configuration
~/.config/llm/keys.toml # API keys (0600 permissions)
~/.local/share/llm/logs/ # Conversation logs (JSONL)
Set LLM_USER_PATH to put everything in one directory (useful for testing or migrating from Python llm).
config.toml:
default_model = "gpt-4o-mini"
logging = true
[aliases]
claude = "claude-sonnet-4-6"
fast = "gpt-4o-mini"keys.toml:
openai = "sk-..."
anthropic = "sk-ant-..."| Variable | Purpose |
|---|---|
OPENAI_API_KEY |
OpenAI API key (fallback if not in keys.toml) |
ANTHROPIC_API_KEY |
Anthropic API key (fallback if not in keys.toml) |
OPENAI_BASE_URL |
Override OpenAI API endpoint (for compatible APIs) |
ANTHROPIC_BASE_URL |
Override Anthropic API endpoint |
LLM_DEFAULT_MODEL |
Override default model |
LLM_USER_PATH |
Override config/data directory (flat layout) |
Seven Rust crates in a Cargo workspace:
crates/
llm-core/ Traits, types, streaming, errors, config, keys, schema DSL, chain loop
llm-openai/ OpenAI Chat API provider (streaming + tools + structured output)
llm-anthropic/ Anthropic Messages API provider (streaming + tools + structured output)
llm-store/ JSONL conversation log storage and queries
llm-cli/ CLI binary (the `llm` command)
llm-wasm/ WASM library for browser/Obsidian (excluded from workspace)
llm-python/ Python native module via PyO3 (excluded from workspace)
Dependency flow: llm-cli, llm-wasm, and llm-python are top-level entry points -> llm-openai + llm-anthropic + llm-store -> llm-core. No cycles.
Key design choices vs the Python original:
- Subprocess extensibility, not in-process plugins. Instead of Python's pluggy-based plugin system, external tools (
llm-tool-*) and providers (llm-provider-*) are standalone executables discovered on$PATH. Any language can implement the JSON stdin/stdout protocol. Compiled-in providers (OpenAI, Anthropic) are feature-gated for a minimal core binary. - JSONL storage. One file per conversation instead of SQLite. Append-only, human-readable, no migrations.
- Async-first. Single
Providertrait using futures streams, no sync/async class duplication. - TOML config. Two files (
config.toml+keys.toml) instead of six scattered JSON/YAML/text files. - Feature-gated providers. Compile only the providers you need:
--features openai,anthropic(both default), or--no-default-featuresfor a minimal binary. - Multi-target. Core crates compile for both native and
wasm32-unknown-unknown. The same provider code runs in the CLI, in a browser, and in a Python module.
See doc/design/architecture.md for design rationale, doc/roadmap.md for the phased roadmap.
cargo test --workspace # 530 tests across core workspace crates| Crate | Tests | What's covered |
|---|---|---|
llm-core |
198 | Types, config, keys, streams, schema DSL, chain loop, ChainEvent, ParallelConfig dispatch, messages, agent config, retry, budget (mock provider) |
llm-openai |
44 | HTTP mocking (wiremock), SSE parsing, tool calls, structured output, multi-turn, HttpError mapping |
llm-anthropic |
50 | HTTP mocking (wiremock), typed SSE, tool_use blocks, transparent schema wrapping, multi-turn, HttpError mapping |
llm-store |
49 | JSONL round-trips, unicode, malformed recovery, listing/queries, message reconstruction |
llm-cli |
189 | Subprocess protocol/discovery/execution, retry wrapper, dry-run rendering (62 unit), CLI integration (127 e2e with assert_cmd) |
Library targets are verified by their build toolchains: wasm-pack build for WASM, maturin develop for Python.
Current version: v0.9. Phases 1–9 complete. See doc/roadmap.md for the full status table and remaining work.
- v0.1 --- CLI, WASM library, Python module; OpenAI + Anthropic providers end-to-end.
- v0.2 --- Tool calling, chain loop, built-in tools, structured output, schema DSL.
- v0.3 --- Multi-turn conversations,
-c/--cid,llm chatREPL, fullllm logs. - v0.4 --- Subprocess extensibility (
llm-tool-*,llm-provider-*),llm plugins,-v/--verbose,-o/--option, aliases. - v0.5 --- Agent config & discovery (
llm agent run/list/show/init/path). - v0.6 --- Budget tracking with cumulative usage and per-chain enforcement.
- v0.7 --- Retry/backoff with exponential delay and jitter for transient HTTP errors.
- v0.8 ---
--dry-runforllm agent run(plain or--json). - v0.9 --- Parallel tool execution within a chain iteration, order-preserving, opt-out with
--sequential-tools.
Next up: Ollama provider, attachments, extract flags. See the Future Work section of the roadmap. Sub-agent delegation and an agent memory system are explicitly parked — llm-rs delegates hierarchical workflows to specialist tools. See the design note for the rationale.
Out of scope: token budget enforcement across nested invocations. Each llm call tracks its own budget; users who need hierarchical budget caps should do shell-level accounting (e.g., sum usage from --json output across a wrapping script).
GPLv3