Skip to content

REPL Language RAGSH

Steven Enamakel edited this page Jun 29, 2026 · 1 revision

REPL Language (.ragsh)

The REPL language is TinyAgents' imperative orchestration surface — the RLM/CodeAct loop. Where .rag declares graph topology, .ragsh is an interactive, session-oriented language for inspecting, scripting, and recursively orchestrating harness and graph runs. It is explicitly inspired by Recursive Language Models (Zhang, Kraska, Khattab, 2025; alexzhang13/rlm) and CodeAct-style agents, where a model writes small programs, inspects their output, calls sub-models / sub-agents / sub-graphs as functions, and iterates until it has a final answer.

The core RLM idea this surface ports: context and intermediate state live in a persistent REPL namespace as runtime values, while model calls, recursive sub-calls, and tools are exposed as capability-bound functions inside that namespace — instead of being stuffed into one context window. See Recursion and RLM for the lineage and how this mitigates "context rot."

A non-negotiable rule runs through the whole design: .ragsh never bypasses the registry, policy, or run limits. It is an orchestration surface, not a privilege-escalation surface.

Source lives in src/repl/; the module spec is docs/modules/repl-language/README.md and the detailed design (recursion, CodeAct loop, Rhai embedding, events) is docs/modules/repl-language/design.md.

Milestone status (honest)

The .ragsh module is currently at milestone R1 (Documentation and Types). What that means concretely, reading src/repl/mod.rs and src/repl/types.rs:

  • The line grammar, parser, command model, session namespace, and capability policy are implemented and tested.
  • Side-effect-free commands (set, get, show, help, quit) execute fully in-session today.
  • Commands that need the live runtime (load, compile, run, call) are parsed and policy-checked, then returned as a ReplOutcome::Planned record describing the intended action — they are not yet wired to the harness/graph runtime. That wiring (Rhai backend, model_query, tool_call, graph_run, the CodeAct loop, recursive sub-calls with depth tracking) is scheduled for milestones R2–R6 and R7 (Python sandbox backend).

So the recursive sub-model / sub-agent / sub-graph calls, depth tracking, and CodeAct loop described below are the designed behaviour the types and policy are built to enforce; the executing parts that exist today are the parser, session, and the policy gate. The sections marked (designed, R2–R6) document the target the current types are shaped for.

Line grammar

A .ragsh session is line-oriented. Each line is one command:

line   = verb ( ws+ arg )* ws*
verb   = [a-zA-Z][a-zA-Z0-9_-]*
arg    = quoted | bare
quoted = '"' ( <any> | '\\' <any> )* '"'      // \\  \"  \n  \t escapes
bare   = ( <non-whitespace> )+

The first token is the command verb, matched case-insensitively against the verb table. Subsequent tokens are positional arguments. For the call verb, the remainder of the line after the capability name is parsed as a single JSON value, so multi-token JSON objects and arrays are accepted verbatim.

parse_command(line) returns a [ReplCommand], or TinyAgentsError::Parse for empty input, an unknown verb, a missing required argument, an unterminated quoted string, or invalid JSON passed to call.

Capability-bound commands

Verb Signature Capability gate Status today
help help (also ?) none executes
quit quit (also exit, q) none executes
set set <key> <value> none executes
get get <key> none executes
show show vars|graphs|status none executes
load load <path> "load" policy-checked → Planned
compile compile <name> "compile" policy-checked → Planned
run run <graph> <input> "run" policy-checked → Planned
call call <capability> <json> the named capability policy-checked → Planned

The capability gate is a [CapabilityPolicy] — an allowlist of names. It denies by default: a fresh ReplSession allows nothing. Grant access with CapabilityPolicy::allow("run") or CapabilityPolicy::from_list(["run", "my_tool"]). Any gated command whose capability is not on the allowlist returns TinyAgentsError::Capability before it would touch the runtime. This is the single choke point that keeps a .ragsh session — including one driven by a model — from invoking anything the host has not explicitly permitted.

Persistent session namespace

A [ReplSession] holds three things: a variable namespace, a capability policy, and a command history.

ReplSession {
  variables: HashMap<String, serde_json::Value>,  // persists across commands
  policy:    CapabilityPolicy,                     // deny-by-default allowlist
  history:   Vec<ReplCommand>,                     // every command, in order
}
  • set <key> <value> stores a string; ReplSession::set(key, value) stores any serde_json::Value for richer data.
  • get <key> returns the value (or null); show vars dumps the whole namespace; show status reports namespace size, history length, and allowlist size.
  • Every command is appended to history before it executes, so a session is fully replayable.

The namespace persisting across commands is the same idea as RLM's persistent locals: a model can stash an intermediate result in a variable on one line and consume it on the next, instead of re-deriving it from a giant prompt.

use tinyagents::repl::{ReplSession, CapabilityPolicy, ReplOutcome};

let policy = CapabilityPolicy::from_list(["my_tool"]);
let mut session = ReplSession::new().with_policy(policy);

session.set("x", serde_json::json!(42));
assert_eq!(session.get("x"), Some(&serde_json::json!(42)));

Command outcomes

ReplSession::execute(cmd) returns a [ReplOutcome]:

  • Message(String) — human-readable output from a side-effect-free command.
  • Value(serde_json::Value) — a value read from the namespace.
  • Planned { action, detail } — a policy-checked command was recorded but live harness/graph execution is deferred (R2–R6). The detail carries the structured parameters of the intended call.
  • Quit — the session was asked to terminate.

Recursive sub-calls and depth tracking (designed, R2–R6)

The point of the REPL is recursion: a session (often itself driven by a model) can call sub-models, sub-agents, and sub-graphs as functions, and those child runs are first-class observable runs whose events, usage, and cost roll up to the parent. The design exposes a small, stable set of capability functions (per design.md):

Function Lowers to Use when
model_query ModelRegistryChatModel::invoke one provider-neutral model call
model_query_batched bounded-concurrency model calls many calls, order preserved
agent_query AgentHarness::run a sub-task needing model–tool iteration
graph_run CompiledGraph::run / resume a sub-task with explicit topology/interrupts
tool_call ToolRegistry + schema validation call one registered tool
graph_define / graph_validate / graph_compile / graph_diff / graph_register the .rag compiler draft/validate/compile/register a generated graph
emit / answer / show_vars event sink / session control tracing and finishing the loop

Every one of these is a host capability, not a script-native side effect. Recursion is bounded by ReplPolicy, which fails closed on:

  • max_depth — recursion depth for sub-model / sub-agent / sub-graph calls,
  • max_model_calls, max_tool_calls, max_graph_calls,
  • max_operations, max_iterations, max_script_bytes, max_output_bytes,
  • max_concurrency, timeout, and generated_graphs_require_review.

Child harness/graph events preserve the root_run_id, parent_run_id, cell id, node id (when inside a graph), recursion depth, and capability name — so a deep recursive trajectory remains a single inspectable tree. This is the same depth-tracking discipline the graph subgraph and sub-agent surfaces enforce.

It never bypasses the registry, policy, or limits

This is worth restating because it is the design's spine. A .ragsh session — even a fully model-driven one — can only:

  • call registered models, tools, agents, and graphs (capability functions resolve names through the registries; unregistered names error),
  • do so when the policy allowlist permits it (deny-by-default CapabilityPolicy today; the richer ReplPolicy limits in the design),
  • within bounded operation counts, output size, call counts, recursion depth, concurrency, and timeout.

It has no direct filesystem, network, environment-variable, or process access, and it cannot install model-generated graph topology directly. A model-authored graph must pass through the .rag compiler and policy checks — graph_definegraph_validategraph_compile → (review) → graph_register — exactly as a human-authored blueprint does. That is how an agent can define and run its own graph without acquiring arbitrary topology mutation or host-code execution.

The CodeAct loop (designed, R6)

A model-driven REPL agent follows this lifecycle (from design.md):

  1. Create a ReplSession and load the context, state, messages, history, and run variables.
  2. Build a model request describing the available REPL functions, then invoke the model through the harness.
  3. Extract fenced ragsh blocks from the assistant message.
  4. Execute each block in the session; capture stdout, changed variables, call records, events, and errors.
  5. Append a compact execution result as the next user message.
  6. Repeat until answer(...) is called or limits are reached; then persist events, usage, cost, and the final answer.

When this loop runs inside a graph node (kind repl_agent), the graph still owns node routing, checkpointing, interrupts, recursion depth, and failure policy — so you get graph → REPL → (sub-model / sub-agent / sub-graph) recursion with one consistent observability and policy story.

Backend direction

The design recommends Rhai as the first in-process REPL backend: a Rust-native, sandboxed-by-default embedded scripting language whose host API lets TinyAgents register exactly the capability functions a script may use, with Engine::set_max_operations to fail closed on runaway scripts. Python is documented as a future out-of-process compatibility sandbox (R7) for RLM-style workflows where the sandbox boundary must be explicit. Neither backend changes the rule that every capability is registered, typed, and policy-checked at the Rust boundary.

See also

  • Expressive Language (.rag) — the declarative blueprint format .ragsh drafts, compiles, and registers.
  • Graph Runtime — the durable runtime run / graph_run drive.
  • Harness — model calls, sub-agents, and the CodeAct host loop.
  • Registry — the capability catalog the policy gates resolve against.
  • Recursion and RLM — the RLM execution model and lineage.

TinyAgents

Recursive language-model (RLM) harness for Rust.

Getting started

Concepts

Modules

Providers

Contributing


Clone this wiki locally