-
Notifications
You must be signed in to change notification settings - Fork 0
REPL Language RAGSH
The REPL language is TinyAgents' imperative orchestration surface — the
RLM/CodeAct loop. Where .rag declares graph
topology, .ragsh is an interactive, session-oriented language for inspecting,
scripting, and recursively orchestrating harness and graph runs. It is
explicitly inspired by Recursive Language Models (Zhang, Kraska, Khattab, 2025;
alexzhang13/rlm) and CodeAct-style
agents, where a model writes small programs, inspects their output, calls
sub-models / sub-agents / sub-graphs as functions, and iterates until it has a
final answer.
The core RLM idea this surface ports: context and intermediate state live in a persistent REPL namespace as runtime values, while model calls, recursive sub-calls, and tools are exposed as capability-bound functions inside that namespace — instead of being stuffed into one context window. See Recursion and RLM for the lineage and how this mitigates "context rot."
A non-negotiable rule runs through the whole design: .ragsh never bypasses
the registry, policy, or run limits. It is an orchestration surface, not a
privilege-escalation surface.
Source lives in src/repl/;
the module spec is
docs/modules/repl-language/README.md
and the detailed design (recursion, CodeAct loop, Rhai embedding, events) is
docs/modules/repl-language/design.md.
The .ragsh module is currently at milestone R1 (Documentation and Types).
What that means concretely, reading src/repl/mod.rs and src/repl/types.rs:
- The line grammar, parser, command model, session namespace, and capability policy are implemented and tested.
-
Side-effect-free commands (
set,get,show,help,quit) execute fully in-session today. - Commands that need the live runtime (
load,compile,run,call) are parsed and policy-checked, then returned as aReplOutcome::Plannedrecord describing the intended action — they are not yet wired to the harness/graph runtime. That wiring (Rhai backend,model_query,tool_call,graph_run, the CodeAct loop, recursive sub-calls with depth tracking) is scheduled for milestones R2–R6 and R7 (Python sandbox backend).
So the recursive sub-model / sub-agent / sub-graph calls, depth tracking, and CodeAct loop described below are the designed behaviour the types and policy are built to enforce; the executing parts that exist today are the parser, session, and the policy gate. The sections marked (designed, R2–R6) document the target the current types are shaped for.
A .ragsh session is line-oriented. Each line is one command:
line = verb ( ws+ arg )* ws*
verb = [a-zA-Z][a-zA-Z0-9_-]*
arg = quoted | bare
quoted = '"' ( <any> | '\\' <any> )* '"' // \\ \" \n \t escapes
bare = ( <non-whitespace> )+
The first token is the command verb, matched case-insensitively against the
verb table. Subsequent tokens are positional arguments. For the call verb, the
remainder of the line after the capability name is parsed as a single JSON
value, so multi-token JSON objects and arrays are accepted verbatim.
parse_command(line) returns a [ReplCommand], or TinyAgentsError::Parse for
empty input, an unknown verb, a missing required argument, an unterminated
quoted string, or invalid JSON passed to call.
| Verb | Signature | Capability gate | Status today |
|---|---|---|---|
help |
help (also ?) |
none | executes |
quit |
quit (also exit, q) |
none | executes |
set |
set <key> <value> |
none | executes |
get |
get <key> |
none | executes |
show |
show vars|graphs|status |
none | executes |
load |
load <path> |
"load" |
policy-checked → Planned
|
compile |
compile <name> |
"compile" |
policy-checked → Planned
|
run |
run <graph> <input> |
"run" |
policy-checked → Planned
|
call |
call <capability> <json> |
the named capability | policy-checked → Planned
|
The capability gate is a [CapabilityPolicy] — an allowlist of names. It
denies by default: a fresh ReplSession allows nothing. Grant access with
CapabilityPolicy::allow("run") or CapabilityPolicy::from_list(["run", "my_tool"]). Any gated command whose capability is not on the allowlist returns
TinyAgentsError::Capability before it would touch the runtime. This is the
single choke point that keeps a .ragsh session — including one driven by a
model — from invoking anything the host has not explicitly permitted.
A [ReplSession] holds three things: a variable namespace, a capability policy,
and a command history.
ReplSession {
variables: HashMap<String, serde_json::Value>, // persists across commands
policy: CapabilityPolicy, // deny-by-default allowlist
history: Vec<ReplCommand>, // every command, in order
}
-
set <key> <value>stores a string;ReplSession::set(key, value)stores anyserde_json::Valuefor richer data. -
get <key>returns the value (ornull);show varsdumps the whole namespace;show statusreports namespace size, history length, and allowlist size. - Every command is appended to
historybefore it executes, so a session is fully replayable.
The namespace persisting across commands is the same idea as RLM's persistent locals: a model can stash an intermediate result in a variable on one line and consume it on the next, instead of re-deriving it from a giant prompt.
use tinyagents::repl::{ReplSession, CapabilityPolicy, ReplOutcome};
let policy = CapabilityPolicy::from_list(["my_tool"]);
let mut session = ReplSession::new().with_policy(policy);
session.set("x", serde_json::json!(42));
assert_eq!(session.get("x"), Some(&serde_json::json!(42)));
ReplSession::execute(cmd) returns a [ReplOutcome]:
-
Message(String)— human-readable output from a side-effect-free command. -
Value(serde_json::Value)— a value read from the namespace. -
Planned { action, detail }— a policy-checked command was recorded but live harness/graph execution is deferred (R2–R6). Thedetailcarries the structured parameters of the intended call. -
Quit— the session was asked to terminate.
The point of the REPL is recursion: a session (often itself driven by a model)
can call sub-models, sub-agents, and sub-graphs as functions, and those
child runs are first-class observable runs whose events, usage, and cost roll up
to the parent. The design exposes a small, stable set of capability functions
(per design.md):
| Function | Lowers to | Use when |
|---|---|---|
model_query |
ModelRegistry → ChatModel::invoke
|
one provider-neutral model call |
model_query_batched |
bounded-concurrency model calls | many calls, order preserved |
agent_query |
AgentHarness::run |
a sub-task needing model–tool iteration |
graph_run |
CompiledGraph::run / resume
|
a sub-task with explicit topology/interrupts |
tool_call |
ToolRegistry + schema validation |
call one registered tool |
graph_define / graph_validate / graph_compile / graph_diff / graph_register
|
the .rag compiler |
draft/validate/compile/register a generated graph |
emit / answer / show_vars
|
event sink / session control | tracing and finishing the loop |
Every one of these is a host capability, not a script-native side effect.
Recursion is bounded by ReplPolicy, which fails closed on:
-
max_depth— recursion depth for sub-model / sub-agent / sub-graph calls, -
max_model_calls,max_tool_calls,max_graph_calls, -
max_operations,max_iterations,max_script_bytes,max_output_bytes, -
max_concurrency,timeout, andgenerated_graphs_require_review.
Child harness/graph events preserve the root_run_id, parent_run_id, cell id,
node id (when inside a graph), recursion depth, and capability name — so a deep
recursive trajectory remains a single inspectable tree. This is the same
depth-tracking discipline the graph subgraph and
sub-agent surfaces enforce.
This is worth restating because it is the design's spine. A .ragsh session —
even a fully model-driven one — can only:
- call registered models, tools, agents, and graphs (capability functions resolve names through the registries; unregistered names error),
- do so when the policy allowlist permits it (deny-by-default
CapabilityPolicytoday; the richerReplPolicylimits in the design), - within bounded operation counts, output size, call counts, recursion depth, concurrency, and timeout.
It has no direct filesystem, network, environment-variable, or process
access, and it cannot install model-generated graph topology directly. A
model-authored graph must pass through the .rag
compiler and policy checks — graph_define → graph_validate → graph_compile
→ (review) → graph_register — exactly as a human-authored blueprint does. That
is how an agent can define and run its own graph without acquiring arbitrary
topology mutation or host-code execution.
A model-driven REPL agent follows this lifecycle (from design.md):
- Create a
ReplSessionand load thecontext,state,messages,history, andrunvariables. - Build a model request describing the available REPL functions, then invoke the model through the harness.
- Extract fenced
ragshblocks from the assistant message. - Execute each block in the session; capture stdout, changed variables, call records, events, and errors.
- Append a compact execution result as the next user message.
- Repeat until
answer(...)is called or limits are reached; then persist events, usage, cost, and the final answer.
When this loop runs inside a graph node (kind repl_agent), the graph still
owns node routing, checkpointing, interrupts, recursion depth, and failure
policy — so you get graph → REPL → (sub-model / sub-agent / sub-graph) recursion
with one consistent observability and policy story.
The design recommends Rhai as the first in-process REPL backend: a
Rust-native, sandboxed-by-default embedded scripting language whose host API lets
TinyAgents register exactly the capability functions a script may use, with
Engine::set_max_operations to fail closed on runaway scripts. Python is
documented as a future out-of-process compatibility sandbox (R7) for
RLM-style workflows where the sandbox boundary must be explicit. Neither backend
changes the rule that every capability is registered, typed, and policy-checked
at the Rust boundary.
-
Expressive Language (.rag) — the declarative
blueprint format
.ragshdrafts, compiles, and registers. -
Graph Runtime — the durable runtime
run/graph_rundrive. - Harness — model calls, sub-agents, and the CodeAct host loop.
- Registry — the capability catalog the policy gates resolve against.
- Recursion and RLM — the RLM execution model and lineage.
Recursive language-model (RLM) harness for Rust.
Getting started
Concepts
Modules
Providers
Contributing