REPL Language RAGSH

REPL Language (`.ragsh`)

The REPL language is TinyAgents' imperative orchestration surface — the RLM/CodeAct loop. Where .rag declares graph topology, .ragsh is an interactive, session-oriented language for inspecting, scripting, and recursively orchestrating harness and graph runs. It is explicitly inspired by Recursive Language Models (Zhang, Kraska, Khattab, 2025; alexzhang13/rlm) and CodeAct-style agents, where a model writes small programs, inspects their output, calls sub-models / sub-agents / sub-graphs as functions, and iterates until it has a final answer.

The core RLM idea this surface ports: context and intermediate state live in a persistent REPL namespace as runtime values, while model calls, recursive sub-calls, and tools are exposed as capability-bound functions inside that namespace — instead of being stuffed into one context window. See Recursion and RLM for the lineage and how this mitigates "context rot."

A non-negotiable rule runs through the whole design: .ragsh never bypasses the registry, policy, or run limits. It is an orchestration surface, not a privilege-escalation surface.

Source lives in src/repl/; the module spec is docs/modules/repl-language/README.md and the detailed design (recursion, CodeAct loop, Rhai embedding, events) is docs/modules/repl-language/design.md.

Milestone status (honest)

The .ragsh module is currently at milestone R1 (Documentation and Types). What that means concretely, reading src/repl/mod.rs and src/repl/types.rs:

The line grammar, parser, command model, session namespace, and capability policy are implemented and tested.
Side-effect-free commands (set, get, show, help, quit) execute fully in-session today.
Commands that need the live runtime (load, compile, run, call) are parsed and policy-checked, then returned as a ReplOutcome::Planned record describing the intended action — they are not yet wired to the harness/graph runtime. That wiring (Rhai backend, model_query, tool_call, graph_run, the CodeAct loop, recursive sub-calls with depth tracking) is scheduled for milestones R2–R6 and R7 (Python sandbox backend).

So the recursive sub-model / sub-agent / sub-graph calls, depth tracking, and CodeAct loop described below are the designed behaviour the types and policy are built to enforce; the executing parts that exist today are the parser, session, and the policy gate. The sections marked (designed, R2–R6) document the target the current types are shaped for.

Line grammar

A .ragsh session is line-oriented. Each line is one command:

line   = verb ( ws+ arg )* ws*
verb   = [a-zA-Z][a-zA-Z0-9_-]*
arg    = quoted | bare
quoted = '"' ( <any> | '\\' <any> )* '"'      // \\  \"  \n  \t escapes
bare   = ( <non-whitespace> )+

The first token is the command verb, matched case-insensitively against the verb table. Subsequent tokens are positional arguments. For the call verb, the remainder of the line after the capability name is parsed as a single JSON value, so multi-token JSON objects and arrays are accepted verbatim.

parse_command(line) returns a [ReplCommand], or TinyAgentsError::Parse for empty input, an unknown verb, a missing required argument, an unterminated quoted string, or invalid JSON passed to call.

Capability-bound commands

Verb	Signature	Capability gate	Status today
`help`	`help` (also `?`)	none	executes
`quit`	`quit` (also `exit`, `q`)	none	executes
`set`	`set <key> <value>`	none	executes
`get`	`get <key>`	none	executes
`show`	`show vars\|graphs\|status`	none	executes
`load`	`load <path>`	`"load"`	policy-checked → `Planned`
`compile`	`compile <name>`	`"compile"`	policy-checked → `Planned`
`run`	`run <graph> <input>`	`"run"`	policy-checked → `Planned`
`call`	`call <capability> <json>`	the named capability	policy-checked → `Planned`

The capability gate is a [CapabilityPolicy] — an allowlist of names. It denies by default: a fresh ReplSession allows nothing. Grant access with CapabilityPolicy::allow("run") or CapabilityPolicy::from_list(["run", "my_tool"]). Any gated command whose capability is not on the allowlist returns TinyAgentsError::Capability before it would touch the runtime. This is the single choke point that keeps a .ragsh session — including one driven by a model — from invoking anything the host has not explicitly permitted.

Persistent session namespace

A [ReplSession] holds three things: a variable namespace, a capability policy, and a command history.

ReplSession {
  variables: HashMap<String, serde_json::Value>,  // persists across commands
  policy:    CapabilityPolicy,                     // deny-by-default allowlist
  history:   Vec<ReplCommand>,                     // every command, in order
}

set <key> <value> stores a string; ReplSession::set(key, value) stores any serde_json::Value for richer data.
get <key> returns the value (or null); show vars dumps the whole namespace; show status reports namespace size, history length, and allowlist size.
Every command is appended to history before it executes, so a session is fully replayable.

The namespace persisting across commands is the same idea as RLM's persistent locals: a model can stash an intermediate result in a variable on one line and consume it on the next, instead of re-deriving it from a giant prompt.

use tinyagents::repl::{ReplSession, CapabilityPolicy, ReplOutcome};

let policy = CapabilityPolicy::from_list(["my_tool"]);
let mut session = ReplSession::new().with_policy(policy);

session.set("x", serde_json::json!(42));
assert_eq!(session.get("x"), Some(&serde_json::json!(42)));

Command outcomes

ReplSession::execute(cmd) returns a [ReplOutcome]:

Message(String) — human-readable output from a side-effect-free command.
Value(serde_json::Value) — a value read from the namespace.
Planned { action, detail } — a policy-checked command was recorded but live harness/graph execution is deferred (R2–R6). The detail carries the structured parameters of the intended call.
Quit — the session was asked to terminate.

Recursive sub-calls and depth tracking (designed, R2–R6)

The point of the REPL is recursion: a session (often itself driven by a model) can call sub-models, sub-agents, and sub-graphs as functions, and those child runs are first-class observable runs whose events, usage, and cost roll up to the parent. The design exposes a small, stable set of capability functions (per design.md):

Function	Lowers to	Use when
`model_query`	`ModelRegistry` → `ChatModel::invoke`	one provider-neutral model call
`model_query_batched`	bounded-concurrency model calls	many calls, order preserved
`agent_query`	`AgentHarness::run`	a sub-task needing model–tool iteration
`graph_run`	`CompiledGraph::run` / `resume`	a sub-task with explicit topology/interrupts
`tool_call`	`ToolRegistry` + schema validation	call one registered tool
`graph_define` / `graph_validate` / `graph_compile` / `graph_diff` / `graph_register`	the `.rag` compiler	draft/validate/compile/register a generated graph
`emit` / `answer` / `show_vars`	event sink / session control	tracing and finishing the loop

Every one of these is a host capability, not a script-native side effect. Recursion is bounded by ReplPolicy, which fails closed on:

max_depth — recursion depth for sub-model / sub-agent / sub-graph calls,
max_model_calls, max_tool_calls, max_graph_calls,
max_operations, max_iterations, max_script_bytes, max_output_bytes,
max_concurrency, timeout, and generated_graphs_require_review.

Child harness/graph events preserve the root_run_id, parent_run_id, cell id, node id (when inside a graph), recursion depth, and capability name — so a deep recursive trajectory remains a single inspectable tree. This is the same depth-tracking discipline the graph subgraph and sub-agent surfaces enforce.

It never bypasses the registry, policy, or limits

This is worth restating because it is the design's spine. A .ragsh session — even a fully model-driven one — can only:

call registered models, tools, agents, and graphs (capability functions resolve names through the registries; unregistered names error),
do so when the policy allowlist permits it (deny-by-default CapabilityPolicy today; the richer ReplPolicy limits in the design),
within bounded operation counts, output size, call counts, recursion depth, concurrency, and timeout.

It has no direct filesystem, network, environment-variable, or process access, and it cannot install model-generated graph topology directly. A model-authored graph must pass through the .rag compiler and policy checks — graph_define → graph_validate → graph_compile → (review) → graph_register — exactly as a human-authored blueprint does. That is how an agent can define and run its own graph without acquiring arbitrary topology mutation or host-code execution.

The CodeAct loop (designed, R6)

A model-driven REPL agent follows this lifecycle (from design.md):

Create a ReplSession and load the context, state, messages, history, and run variables.
Build a model request describing the available REPL functions, then invoke the model through the harness.
Extract fenced ragsh blocks from the assistant message.
Execute each block in the session; capture stdout, changed variables, call records, events, and errors.
Append a compact execution result as the next user message.
Repeat until answer(...) is called or limits are reached; then persist events, usage, cost, and the final answer.

When this loop runs inside a graph node (kind repl_agent), the graph still owns node routing, checkpointing, interrupts, recursion depth, and failure policy — so you get graph → REPL → (sub-model / sub-agent / sub-graph) recursion with one consistent observability and policy story.

Backend direction

The design recommends Rhai as the first in-process REPL backend: a Rust-native, sandboxed-by-default embedded scripting language whose host API lets TinyAgents register exactly the capability functions a script may use, with Engine::set_max_operations to fail closed on runaway scripts. Python is documented as a future out-of-process compatibility sandbox (R7) for RLM-style workflows where the sandbox boundary must be explicit. Neither backend changes the rule that every capability is registered, typed, and policy-checked at the Rust boundary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

REPL Language RAGSH

REPL Language (`.ragsh`)

Milestone status (honest)

Line grammar

Capability-bound commands

Persistent session namespace

Command outcomes

Recursive sub-calls and depth tracking (designed, R2–R6)

It never bypasses the registry, policy, or limits

The CodeAct loop (designed, R6)

Backend direction

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TinyAgents

Clone this wiki locally

Uh oh!

REPL Language RAGSH

REPL Language (.ragsh)

Milestone status (honest)

Line grammar

Capability-bound commands

Persistent session namespace

Command outcomes

Recursive sub-calls and depth tracking (designed, R2–R6)

It never bypasses the registry, policy, or limits

The CodeAct loop (designed, R6)

Backend direction

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TinyAgents

Clone this wiki locally

REPL Language (`.ragsh`)