`llm-runtime`

llm-runtime is a runtime layer for application-owned LLM workflows. It wraps provider invocation with one package boundary for tool orchestration, MCP integration, and skill loading.

This package is designed for harnesses that want a stable per-call API without pushing provider-specific details, built-in tool contracts, MCP wiring, and skill discovery into application code.

It is extracted from the Agent World to be a standalone dependency for any application that needs provider calls with tool calls, MCP and agent skill support to build your own full agent orchestration or harness.

Installation

npm install llm-runtime

The published package targets Node.js 18 and later and exposes a single root entrypoint.

What This Package Owns

Provider dispatch for generate(...) and stream(...)
Generic host-agnostic turn-loop orchestration through runTurnLoop(...)
Intrinsic turn-loop safety limits, stop semantics, trace summaries, and lifecycle hooks
Built-in tools such as file access, shell execution, and skill loading
MCP tool discovery and execution
Skill discovery from configured skill roots
Stable environment-level registries for MCP servers and skills
Cleanup boundaries for explicit environments and convenience-path caches

Public API

createLLMEnvironment(...)
disposeLLMEnvironment(...)
disposeLLMRuntimeCaches()
generate(...)
stream(...)
runTurnLoop(...)

The package is per-call first. You can call generate(...) or stream(...) directly, or inject an explicit environment when your harness wants stable provider, MCP, and skill dependencies.

Cleanup

Use the public cleanup APIs when the runtime owns MCP clients or cached tool-discovery state:

disposeLLMEnvironment(environment) shuts down the environment MCP registry only when that registry was created by the runtime.
disposeLLMRuntimeCaches() shuts down cached convenience-path MCP registries and clears cached provider, MCP, and skill discovery state.

Ownership is split deliberately:

The runtime owns cleanup for environments created for runtime use and for the convenience-path caches it creates internally.
The harness still owns temporary workspaces, transcript persistence, any caller-injected registries, and any other non-runtime resources attached to its application.

Mental Model

The main rule is simple:

Stable harness state belongs in environment
Request-specific state stays per call

Put This In `environment`

Provider configuration store
MCP registry or MCP config
Skill registry or skill roots
Default reasoningEffort
Default toolPermission

Keep This Per Call

provider
model
messages
workingDirectory
reasoningEffort
toolPermission
abortSignal

If a value should change from one request or UI action to the next, it usually should not live in the environment.

Tool Model

llm-runtime merges several tool sources into one callable surface.

Built-In Tools

The minimal runtime core does not require any built-in operational tools. The built-ins below are optional package-owned convenience capabilities exposed from the same package surface.

The package currently reserves these built-in names:

shell_cmd
load_skill
human_intervention_request
web_fetch
read_file
write_file
list_files
grep

Built-ins are package-owned and reserved. Application code can disable or narrow them, but should not redefine them.

Extra Tools

Extra tools are application-specific additions such as lookup_customer or create_ticket. They are additive only and cannot override reserved built-in names.

MCP Tools

MCP tools come from configured external servers. The runtime discovers them, namespaces them, and merges them into the same resolved tool set as built-ins and extra tools.

Skills

Skills are reusable instruction assets discovered from skill roots and loaded through load_skill. Skills are not executable tools; they add instruction context for the model.

`generate(...)` vs `stream(...)`

Both APIs share the same runtime model:

same provider config shape
same tool orchestration
same MCP and skill semantics

The difference is output delivery:

generate(...) returns the final result
stream(...) emits chunks and still returns the final result at the end

`runTurnLoop(...)`

runTurnLoop(...) is the package-owned iterative loop for harnesses that want the package to manage repeated model turns without taking ownership of harness state, persistence, or tool policy.

Use it when your harness needs more control than a single generate(...) or stream(...) call, but still wants one package boundary for:

repeated model invocation
empty-text retry handling
optional plain-text tool-intent normalization
hard iteration, tool-round, repeated-call, and wall-clock safety bounds
structured trace summaries and lifecycle hooks

The split of responsibilities is deliberate:

The package owns loop repetition, hard-stop safety checks, response normalization, trace collection, and lifecycle hook ordering.
The harness owns state shape, message construction, tool execution, persistence, replay, and business-specific completion policy.

Safety And Stop Reasons

runTurnLoop(...) now applies intrinsic package defaults for:

maxIterations
maxConsecutiveToolTurns
maxWallTimeMs
repeated identical tool-call suppression through repeatedToolCallGuard

Terminal reasons are stable string literals suitable for harness branching:

text_response
tool_calls_response
empty_text_stop
rejected_text_response
unhandled_response
max_iterations_exceeded
max_tool_rounds_exceeded
timeout
repeated_tool_call_stopped

The final result keeps state, response, and reason, and also includes:

steps
toolCalls
classifications
retries
stop
elapsedMs

If the loop times out before any model response is available, result.response is null and result.stop carries the timeout detail.

Lifecycle Hooks

Use these additive hooks for tracing and metrics:

onIterationStart(...)
onModelResponse(...)
onClassification(...)
onStop(...)

They do not replace onTextResponse(...), onToolCallsResponse(...), or the other branch callbacks that still own state updates.

Turn-Loop Hardening

For tool-capable turns, runTurnLoop(...) can now reject intent-only narration such as "I will check the file" when the harness still requires action evidence.

Use these hooks when your harness needs hardening against weak tool users:

requiresActionEvidence(...) tells the package whether a non-empty text reply still needs proof of action before it can be accepted as final.
classifyTextResponse(...) lets the harness override package defaults and explicitly classify replies as verified_final_response, intent_only_narration, or non_progressing.
onRejectedTextResponse(...) lets the harness persist rejected narration or other non-progressing text before retrying or stopping.
rejectedTextRetryLimit bounds how many rejected text retries the package should allow before returning rejected_text_response instead of false success.

The package also exports reusable recovery helpers:

DEFAULT_INTENT_ONLY_NARRATION_RECOVERY_INSTRUCTION
DEFAULT_NON_PROGRESSING_TEXT_RECOVERY_INSTRUCTION
DEFAULT_TOOL_VALIDATION_RECOVERY_INSTRUCTION

These are default exported strings, not mutable runtime settings. A harness should treat them as convenient starting points and override the effective recovery text by returning its own transientInstruction from onRejectedTextResponse(...), by returning a custom assessment from classifyTextResponse(...), or by supplying its own validation-recovery instruction after parsing a validation artifact.

Tool validation failures now return durable JSON artifacts instead of opaque error strings. Use parseToolValidationFailureArtifact(...) when the harness wants to detect a validation failure from a tool result and prompt the model to emit a corrected tool call.

Synthetic Tool Calls

When parsePlainTextToolIntent(...) converts a text response into a tool-call response, you can opt in to synthetic marking with markSyntheticToolCalls: true.

When enabled:

generated tool_calls entries include synthetic: true
mirrored assistant-message tool calls include the same marker
result.toolCalls summaries expose the normalized call source and synthetic status

When disabled, plain-text normalization still works, but the public tool-call surface is unchanged.

The boundary remains the same: the package can classify and reject narration, but the harness still owns the policy for when a reply is truly verified and how bounded recovery should be persisted.

You can provide either:

modelRequest when the package should call generate(...) or stream(...) for you
callModel when the harness wants to control model invocation directly

Minimal shape:

import { runTurnLoop, type LLMChatMessage } from 'llm-runtime';

type ChatState = {
  messages: LLMChatMessage[];
  finalText: string;
};

const result = await runTurnLoop({
  initialState: {
    messages: [{ role: 'user', content: 'Find the token and use tools if needed.' }],
    finalText: '',
  },
  emptyTextRetryLimit: 0,
  modelRequest: {
    provider: 'openai',
    model: 'gpt-5',
    builtIns: {
      read_file: true,
    },
  },
  buildMessages: async ({ state, transientInstruction }) => {
    if (!transientInstruction) {
      return state.messages;
    }

    return [
      ...state.messages,
      { role: 'system', content: transientInstruction },
    ];
  },
  onToolCallsResponse: async ({ state, response }) => {
    const nextMessages = [...state.messages, response.assistantMessage];

    for (const toolCall of response.tool_calls ?? []) {
      const toolResult = await executeTool(toolCall);
      nextMessages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content: JSON.stringify(toolResult),
      });
    }

    return {
      state: {
        ...state,
        messages: nextMessages,
      },
      next: {
        control: 'continue',
      },
    };
  },
  onTextResponse: async ({ state, responseText, response }) => ({
    state: {
      ...state,
      messages: [...state.messages, response.assistantMessage],
      finalText: responseText,
    },
  }),
});

console.log(result.state.finalText);

Hardening-oriented shape:

import {
  DEFAULT_INTENT_ONLY_NARRATION_RECOVERY_INSTRUCTION,
  DEFAULT_TOOL_VALIDATION_RECOVERY_INSTRUCTION,
  parseToolValidationFailureArtifact,
  runTurnLoop,
} from 'llm-runtime';

const result = await runTurnLoop({
  initialState,
  emptyTextRetryLimit: 0,
  rejectedTextRetryLimit: 1,
  requiresActionEvidence: ({ state }) => state.awaitingVerifiedAction,
  buildMessages: async ({ state, transientInstruction }) => {
    if (!transientInstruction) {
      return state.messages;
    }

    return [...state.messages, { role: 'system', content: transientInstruction }];
  },
  onRejectedTextResponse: async ({ state, responseText, classification }) => ({
    state: {
      ...state,
      rejected: [...state.rejected, { classification, responseText }],
    },
    next: {
      control: 'continue',
      transientInstruction: DEFAULT_INTENT_ONLY_NARRATION_RECOVERY_INSTRUCTION,
    },
  }),
  onToolCallsResponse: async ({ state, response }) => {
    const nextMessages = [...state.messages, response.assistantMessage];

    for (const toolCall of response.tool_calls ?? []) {
      const toolResult = await executeTool(toolCall);
      const content = JSON.stringify(toolResult);
      const validationArtifact = parseToolValidationFailureArtifact(content);

      nextMessages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content,
      });

      if (validationArtifact) {
        return {
          state: {
            ...state,
            messages: nextMessages,
          },
          next: {
            control: 'continue',
            transientInstruction: DEFAULT_TOOL_VALIDATION_RECOVERY_INSTRUCTION,
          },
        };
      }
    }

    return {
      state: {
        ...state,
        messages: nextMessages,
      },
      next: {
        control: 'continue',
      },
    };
  },
  onTextResponse: async ({ state, response, responseText }) => ({
    state: {
      ...state,
      messages: [...state.messages, response.assistantMessage],
      finalText: responseText,
      awaitingVerifiedAction: false,
    },
  }),
});

Example

import { createLLMEnvironment, generate } from 'llm-runtime';

const environment = createLLMEnvironment({
  providers: {
    openai: {
      apiKey: process.env.OPENAI_API_KEY!,
    },
  },
  skillRoots: ['/app/skills', '/workspace/.codex/skills'],
  defaults: {
    reasoningEffort: 'medium',
    toolPermission: 'auto',
  },
  mcpConfig: {
    servers: {
      docs: {
        command: 'node',
        args: ['docs-server.js'],
        transport: 'stdio',
      },
    },
  },
});

const response = await generate({
  environment,
  provider: 'openai',
  model: 'gpt-5',
  messages: [
    {
      role: 'user',
      content: 'Summarize the workspace and use tools when needed.',
    },
  ],
  workingDirectory: process.cwd(),
  builtIns: {
    read_file: true,
    list_files: true,
    load_skill: true,
  },
});

console.log(response.content);

Harness Guidance

Recommended integration pattern:

Create one stable environment for the harness.
Pass request-specific inputs per call.
Inspect environment.skillRegistry and environment.mcpRegistry when you need to debug discovered skills or MCP servers.
Update skill roots when the harness-level skill search path changes.
Do not rebuild the environment just because request-local values like messages or workingDirectory changed.

Example registry inspection pattern:

import { createLLMEnvironment } from 'llm-runtime';

const environment = createLLMEnvironment();

const skills = await environment.skillRegistry.listSkills();
const servers = environment.mcpRegistry.listServers();

console.table(skills.map((skill) => ({
  skillId: skill.skillId,
  title: skill.title,
})));

console.table(servers.map((server) => ({
  name: server.name,
  transport: server.config.transport,
})));

Local Development

npm run build compiles the package into dist/
npm run check runs TypeScript without emitting files
npm test runs the Vitest suite in tests/llm
npm run test:watch runs the Vitest suite in watch mode
npm run test:e2e runs the showcase script in tests/e2e/llm-package-showcase.ts
npm run test:e2e:dry-run validates the showcase wiring without live provider calls
npm run test:e2e:turn-loop runs the runTurnLoop(...) showcase script in tests/e2e/llm-turn-loop-showcase.ts
npm run test:e2e:turn-loop:dry-run validates the turn-loop showcase wiring without live provider calls
npm run test:e2e:hardening runs deterministic end-to-end hardening coverage for narrated intent recovery and validation-failure correction without a live provider

Use npm run test:e2e:hardening for package-level regression coverage of turn-loop hardening. Use the showcase runners when you want to validate live provider integration and real tool-calling behavior.

The real showcase runners expect a repo-local .env file when using npm run test:e2e or npm run test:e2e:turn-loop.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.docs		.docs
.wiki		.wiki
docs		docs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.npmignore		.npmignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`llm-runtime`

Installation

What This Package Owns

Public API

Cleanup

Mental Model

Put This In `environment`

Keep This Per Call

Tool Model

Built-In Tools

Extra Tools

MCP Tools

Skills

`generate(...)` vs `stream(...)`

`runTurnLoop(...)`

Safety And Stop Reasons

Lifecycle Hooks

Turn-Loop Hardening

Synthetic Tool Calls

Example

Harness Guidance

Local Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-runtime

Installation

What This Package Owns

Public API

Cleanup

Mental Model

Put This In environment

Keep This Per Call

Tool Model

Built-In Tools

Extra Tools

MCP Tools

Skills

generate(...) vs stream(...)

runTurnLoop(...)

Safety And Stop Reasons

Lifecycle Hooks

Turn-Loop Hardening

Synthetic Tool Calls

Example

Harness Guidance

Local Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`llm-runtime`

Put This In `environment`

`generate(...)` vs `stream(...)`

`runTurnLoop(...)`

Packages