# REPL Language (`.ragsh`) The REPL language is TinyAgents' **imperative orchestration surface** — the RLM/CodeAct loop. Where [`.rag`](Expressive-Language-RAG) declares graph topology, `.ragsh` is an interactive, session-oriented language for inspecting, scripting, and **recursively orchestrating** harness and graph runs. It is explicitly inspired by Recursive Language Models (Zhang, Kraska, Khattab, 2025; [`alexzhang13/rlm`](https://github.com/alexzhang13/rlm)) and CodeAct-style agents, where a model writes small programs, inspects their output, calls sub-models / sub-agents / sub-graphs as functions, and iterates until it has a final answer. The core RLM idea this surface ports: **context and intermediate state live in a persistent REPL namespace as runtime values**, while model calls, recursive sub-calls, and tools are exposed as capability-bound functions inside that namespace — instead of being stuffed into one context window. See [Recursion and RLM](Recursion-and-RLM) for the lineage and how this mitigates "context rot." A non-negotiable rule runs through the whole design: **`.ragsh` never bypasses the registry, policy, or run limits.** It is an orchestration surface, not a privilege-escalation surface. Source lives in [`src/repl/`](https://github.com/tinyhumansai/tinyagents/tree/main/src/repl); the module spec is [`docs/modules/repl-language/README.md`](https://github.com/tinyhumansai/tinyagents/blob/main/docs/modules/repl-language/README.md) and the detailed design (recursion, CodeAct loop, Rhai embedding, events) is [`docs/modules/repl-language/design.md`](https://github.com/tinyhumansai/tinyagents/blob/main/docs/modules/repl-language/design.md). ## Milestone status (honest) The `.ragsh` module is currently at **milestone R1 (Documentation and Types)**. What that means concretely, reading `src/repl/mod.rs` and `src/repl/types.rs`: - The **line grammar, parser, command model, session namespace, and capability policy are implemented** and tested. - **Side-effect-free commands** (`set`, `get`, `show`, `help`, `quit`) execute fully in-session today. - Commands that need the live runtime (`load`, `compile`, `run`, `call`) are **parsed and policy-checked**, then returned as a `ReplOutcome::Planned` record describing the intended action — they are **not yet wired** to the harness/graph runtime. That wiring (Rhai backend, `model_query`, `tool_call`, `graph_run`, the CodeAct loop, recursive sub-calls with depth tracking) is scheduled for milestones **R2–R6** and **R7** (Python sandbox backend). So the recursive sub-model / sub-agent / sub-graph calls, depth tracking, and CodeAct loop described below are the **designed** behaviour the types and policy are built to enforce; the executing parts that exist today are the parser, session, and the policy gate. The sections marked *(designed, R2–R6)* document the target the current types are shaped for. ## Line grammar A `.ragsh` session is line-oriented. Each line is one command: ```text line = verb ( ws+ arg )* ws* verb = [a-zA-Z][a-zA-Z0-9_-]* arg = quoted | bare quoted = '"' ( | '\\' )* '"' // \\ \" \n \t escapes bare = ( )+ ``` The first token is the command verb, matched **case-insensitively** against the verb table. Subsequent tokens are positional arguments. For the `call` verb, the **remainder of the line** after the capability name is parsed as a single JSON value, so multi-token JSON objects and arrays are accepted verbatim. `parse_command(line)` returns a [`ReplCommand`], or `TinyAgentsError::Parse` for empty input, an unknown verb, a missing required argument, an unterminated quoted string, or invalid JSON passed to `call`. ## Capability-bound commands | Verb | Signature | Capability gate | Status today | |-----------|--------------------------------|-----------------|--------------| | `help` | `help` (also `?`) | none | executes | | `quit` | `quit` (also `exit`, `q`) | none | executes | | `set` | `set ` | none | executes | | `get` | `get ` | none | executes | | `show` | `show vars\|graphs\|status` | none | executes | | `load` | `load ` | `"load"` | policy-checked → `Planned` | | `compile` | `compile ` | `"compile"` | policy-checked → `Planned` | | `run` | `run ` | `"run"` | policy-checked → `Planned` | | `call` | `call ` | the named capability | policy-checked → `Planned` | The capability gate is a [`CapabilityPolicy`] — an allowlist of names. It **denies by default**: a fresh `ReplSession` allows nothing. Grant access with `CapabilityPolicy::allow("run")` or `CapabilityPolicy::from_list(["run", "my_tool"])`. Any gated command whose capability is not on the allowlist returns `TinyAgentsError::Capability` *before* it would touch the runtime. This is the single choke point that keeps a `.ragsh` session — including one driven by a model — from invoking anything the host has not explicitly permitted. ## Persistent session namespace A [`ReplSession`] holds three things: a variable namespace, a capability policy, and a command history. ```text ReplSession { variables: HashMap, // persists across commands policy: CapabilityPolicy, // deny-by-default allowlist history: Vec, // every command, in order } ``` - `set ` stores a string; `ReplSession::set(key, value)` stores any `serde_json::Value` for richer data. - `get ` returns the value (or `null`); `show vars` dumps the whole namespace; `show status` reports namespace size, history length, and allowlist size. - Every command is appended to `history` before it executes, so a session is fully replayable. The namespace persisting across commands is the same idea as RLM's persistent locals: a model can stash an intermediate result in a variable on one line and consume it on the next, instead of re-deriving it from a giant prompt. ```text use tinyagents::repl::{ReplSession, CapabilityPolicy, ReplOutcome}; let policy = CapabilityPolicy::from_list(["my_tool"]); let mut session = ReplSession::new().with_policy(policy); session.set("x", serde_json::json!(42)); assert_eq!(session.get("x"), Some(&serde_json::json!(42))); ``` ### Command outcomes `ReplSession::execute(cmd)` returns a [`ReplOutcome`]: - `Message(String)` — human-readable output from a side-effect-free command. - `Value(serde_json::Value)` — a value read from the namespace. - `Planned { action, detail }` — a policy-checked command was recorded but live harness/graph execution is deferred (R2–R6). The `detail` carries the structured parameters of the intended call. - `Quit` — the session was asked to terminate. ## Recursive sub-calls and depth tracking *(designed, R2–R6)* The point of the REPL is recursion: a session (often itself driven by a model) can call **sub-models, sub-agents, and sub-graphs as functions**, and those child runs are first-class observable runs whose events, usage, and cost roll up to the parent. The design exposes a small, stable set of capability functions (per `design.md`): | Function | Lowers to | Use when | |----------------------|--------------------------------------------|----------| | `model_query` | `ModelRegistry` → `ChatModel::invoke` | one provider-neutral model call | | `model_query_batched`| bounded-concurrency model calls | many calls, order preserved | | `agent_query` | `AgentHarness::run` | a sub-task needing model–tool iteration | | `graph_run` | `CompiledGraph::run` / `resume` | a sub-task with explicit topology/interrupts | | `tool_call` | `ToolRegistry` + schema validation | call one registered tool | | `graph_define` / `graph_validate` / `graph_compile` / `graph_diff` / `graph_register` | the [`.rag`](Expressive-Language-RAG) compiler | draft/validate/compile/register a generated graph | | `emit` / `answer` / `show_vars` | event sink / session control | tracing and finishing the loop | Every one of these is a **host capability**, not a script-native side effect. Recursion is bounded by `ReplPolicy`, which fails closed on: - `max_depth` — recursion depth for sub-model / sub-agent / sub-graph calls, - `max_model_calls`, `max_tool_calls`, `max_graph_calls`, - `max_operations`, `max_iterations`, `max_script_bytes`, `max_output_bytes`, - `max_concurrency`, `timeout`, and `generated_graphs_require_review`. Child harness/graph events preserve the `root_run_id`, `parent_run_id`, cell id, node id (when inside a graph), recursion depth, and capability name — so a deep recursive trajectory remains a single inspectable tree. This is the same depth-tracking discipline the [graph subgraph](Graph-Runtime) and [sub-agent](Harness) surfaces enforce. ## It never bypasses the registry, policy, or limits This is worth restating because it is the design's spine. A `.ragsh` session — even a fully model-driven one — can only: - call **registered** models, tools, agents, and graphs (capability functions resolve names through the registries; unregistered names error), - do so when the **policy** allowlist permits it (deny-by-default `CapabilityPolicy` today; the richer `ReplPolicy` limits in the design), - within **bounded** operation counts, output size, call counts, recursion depth, concurrency, and timeout. It has **no** direct filesystem, network, environment-variable, or process access, and it cannot install model-generated graph topology directly. A model-authored graph must pass through the [`.rag`](Expressive-Language-RAG) compiler and policy checks — `graph_define` → `graph_validate` → `graph_compile` → (review) → `graph_register` — exactly as a human-authored blueprint does. That is how an agent can define and run *its own* graph without acquiring arbitrary topology mutation or host-code execution. ## The CodeAct loop *(designed, R6)* A model-driven REPL agent follows this lifecycle (from `design.md`): 1. Create a `ReplSession` and load the `context`, `state`, `messages`, `history`, and `run` variables. 2. Build a model request describing the available REPL functions, then invoke the model through the [harness](Harness). 3. Extract fenced `ragsh` blocks from the assistant message. 4. Execute each block in the session; capture stdout, changed variables, call records, events, and errors. 5. Append a compact execution result as the next user message. 6. Repeat until `answer(...)` is called or limits are reached; then persist events, usage, cost, and the final answer. When this loop runs **inside a graph node** (`kind repl_agent`), the graph still owns node routing, checkpointing, interrupts, recursion depth, and failure policy — so you get graph → REPL → (sub-model / sub-agent / sub-graph) recursion with one consistent observability and policy story. ## Backend direction The design recommends **Rhai** as the first in-process REPL backend: a Rust-native, sandboxed-by-default embedded scripting language whose host API lets TinyAgents register exactly the capability functions a script may use, with `Engine::set_max_operations` to fail closed on runaway scripts. **Python** is documented as a future *out-of-process* compatibility sandbox (`R7`) for RLM-style workflows where the sandbox boundary must be explicit. Neither backend changes the rule that every capability is registered, typed, and policy-checked at the Rust boundary. ## See also - [Expressive Language (.rag)](Expressive-Language-RAG) — the declarative blueprint format `.ragsh` drafts, compiles, and registers. - [Graph Runtime](Graph-Runtime) — the durable runtime `run` / `graph_run` drive. - [Harness](Harness) — model calls, sub-agents, and the CodeAct host loop. - [Registry](Registry) — the capability catalog the policy gates resolve against. - [Recursion and RLM](Recursion-and-RLM) — the RLM execution model and lineage.