llm-runtime is a runtime layer for application-owned LLM workflows. It wraps provider invocation with one package boundary for tool orchestration, MCP integration, and skill loading.
This package is designed for harnesses that want a stable per-call API without pushing provider-specific details, built-in tool contracts, MCP wiring, and skill discovery into application code.
It is extracted from the Agent World to be a standalone dependency for any application that needs provider calls with tool calls, MCP and agent skill support to build your own full agent orchestration or harness.
npm install llm-runtimeThe published package targets Node.js 18 and later and exposes a single root entrypoint.
- Provider dispatch for
generate(...)andstream(...) - Generic host-agnostic turn-loop orchestration through
runTurnLoop(...) - Intrinsic turn-loop safety limits, stop semantics, trace summaries, and lifecycle hooks
- Built-in tools such as file access, shell execution, and skill loading
- MCP tool discovery and execution
- Skill discovery from configured skill roots
- Stable environment-level registries for MCP servers and skills
- Cleanup boundaries for explicit environments and convenience-path caches
createLLMEnvironment(...)disposeLLMEnvironment(...)disposeLLMRuntimeCaches()generate(...)stream(...)runTurnLoop(...)
The package is per-call first. You can call generate(...) or stream(...) directly, or inject an explicit environment when your harness wants stable provider, MCP, and skill dependencies.
Use the public cleanup APIs when the runtime owns MCP clients or cached tool-discovery state:
disposeLLMEnvironment(environment)shuts down the environment MCP registry only when that registry was created by the runtime.disposeLLMRuntimeCaches()shuts down cached convenience-path MCP registries and clears cached provider, MCP, and skill discovery state.
Ownership is split deliberately:
- The runtime owns cleanup for environments created for runtime use and for the convenience-path caches it creates internally.
- The harness still owns temporary workspaces, transcript persistence, any caller-injected registries, and any other non-runtime resources attached to its application.
The main rule is simple:
- Stable harness state belongs in
environment - Request-specific state stays per call
- Provider configuration store
- MCP registry or MCP config
- Skill registry or skill roots
- Default
reasoningEffort - Default
toolPermission
providermodelmessagesworkingDirectoryreasoningEfforttoolPermissionabortSignal
If a value should change from one request or UI action to the next, it usually should not live in the environment.
llm-runtime merges several tool sources into one callable surface.
The minimal runtime core does not require any built-in operational tools. The built-ins below are optional package-owned convenience capabilities exposed from the same package surface.
The package currently reserves these built-in names:
shell_cmdload_skillhuman_intervention_requestweb_fetchread_filewrite_filelist_filesgrep
Built-ins are package-owned and reserved. Application code can disable or narrow them, but should not redefine them.
Extra tools are application-specific additions such as lookup_customer or create_ticket. They are additive only and cannot override reserved built-in names.
MCP tools come from configured external servers. The runtime discovers them, namespaces them, and merges them into the same resolved tool set as built-ins and extra tools.
Skills are reusable instruction assets discovered from skill roots and loaded through load_skill. Skills are not executable tools; they add instruction context for the model.
Both APIs share the same runtime model:
- same provider config shape
- same tool orchestration
- same MCP and skill semantics
The difference is output delivery:
generate(...)returns the final resultstream(...)emits chunks and still returns the final result at the end
runTurnLoop(...) is the package-owned iterative loop for harnesses that want the package to manage repeated model turns without taking ownership of harness state, persistence, or tool policy.
Use it when your harness needs more control than a single generate(...) or stream(...) call, but still wants one package boundary for:
- repeated model invocation
- empty-text retry handling
- optional plain-text tool-intent normalization
- hard iteration, tool-round, repeated-call, and wall-clock safety bounds
- structured trace summaries and lifecycle hooks
The split of responsibilities is deliberate:
- The package owns loop repetition, hard-stop safety checks, response normalization, trace collection, and lifecycle hook ordering.
- The harness owns state shape, message construction, tool execution, persistence, replay, and business-specific completion policy.
runTurnLoop(...) now applies intrinsic package defaults for:
maxIterationsmaxConsecutiveToolTurnsmaxWallTimeMs- repeated identical tool-call suppression through
repeatedToolCallGuard
Terminal reasons are stable string literals suitable for harness branching:
text_responsetool_calls_responseempty_text_stoprejected_text_responseunhandled_responsemax_iterations_exceededmax_tool_rounds_exceededtimeoutrepeated_tool_call_stopped
The final result keeps state, response, and reason, and also includes:
stepstoolCallsclassificationsretriesstopelapsedMs
If the loop times out before any model response is available, result.response is null and result.stop carries the timeout detail.
Use these additive hooks for tracing and metrics:
onIterationStart(...)onModelResponse(...)onClassification(...)onStop(...)
They do not replace onTextResponse(...), onToolCallsResponse(...), or the other branch callbacks that still own state updates.
For tool-capable turns, runTurnLoop(...) can now reject intent-only narration such as "I will check the file" when the harness still requires action evidence.
Use these hooks when your harness needs hardening against weak tool users:
requiresActionEvidence(...)tells the package whether a non-empty text reply still needs proof of action before it can be accepted as final.classifyTextResponse(...)lets the harness override package defaults and explicitly classify replies asverified_final_response,intent_only_narration, ornon_progressing.onRejectedTextResponse(...)lets the harness persist rejected narration or other non-progressing text before retrying or stopping.rejectedTextRetryLimitbounds how many rejected text retries the package should allow before returningrejected_text_responseinstead of false success.
The package also exports reusable recovery helpers:
DEFAULT_INTENT_ONLY_NARRATION_RECOVERY_INSTRUCTIONDEFAULT_NON_PROGRESSING_TEXT_RECOVERY_INSTRUCTIONDEFAULT_TOOL_VALIDATION_RECOVERY_INSTRUCTION
These are default exported strings, not mutable runtime settings. A harness should treat them as convenient starting points and override the effective recovery text by returning its own transientInstruction from onRejectedTextResponse(...), by returning a custom assessment from classifyTextResponse(...), or by supplying its own validation-recovery instruction after parsing a validation artifact.
Tool validation failures now return durable JSON artifacts instead of opaque error strings. Use parseToolValidationFailureArtifact(...) when the harness wants to detect a validation failure from a tool result and prompt the model to emit a corrected tool call.
When parsePlainTextToolIntent(...) converts a text response into a tool-call response, you can opt in to synthetic marking with markSyntheticToolCalls: true.
When enabled:
- generated
tool_callsentries includesynthetic: true - mirrored assistant-message tool calls include the same marker
result.toolCallssummaries expose the normalized call source and synthetic status
When disabled, plain-text normalization still works, but the public tool-call surface is unchanged.
The boundary remains the same: the package can classify and reject narration, but the harness still owns the policy for when a reply is truly verified and how bounded recovery should be persisted.
You can provide either:
modelRequestwhen the package should callgenerate(...)orstream(...)for youcallModelwhen the harness wants to control model invocation directly
Minimal shape:
import { runTurnLoop, type LLMChatMessage } from 'llm-runtime';
type ChatState = {
messages: LLMChatMessage[];
finalText: string;
};
const result = await runTurnLoop({
initialState: {
messages: [{ role: 'user', content: 'Find the token and use tools if needed.' }],
finalText: '',
},
emptyTextRetryLimit: 0,
modelRequest: {
provider: 'openai',
model: 'gpt-5',
builtIns: {
read_file: true,
},
},
buildMessages: async ({ state, transientInstruction }) => {
if (!transientInstruction) {
return state.messages;
}
return [
...state.messages,
{ role: 'system', content: transientInstruction },
];
},
onToolCallsResponse: async ({ state, response }) => {
const nextMessages = [...state.messages, response.assistantMessage];
for (const toolCall of response.tool_calls ?? []) {
const toolResult = await executeTool(toolCall);
nextMessages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(toolResult),
});
}
return {
state: {
...state,
messages: nextMessages,
},
next: {
control: 'continue',
},
};
},
onTextResponse: async ({ state, responseText, response }) => ({
state: {
...state,
messages: [...state.messages, response.assistantMessage],
finalText: responseText,
},
}),
});
console.log(result.state.finalText);Hardening-oriented shape:
import {
DEFAULT_INTENT_ONLY_NARRATION_RECOVERY_INSTRUCTION,
DEFAULT_TOOL_VALIDATION_RECOVERY_INSTRUCTION,
parseToolValidationFailureArtifact,
runTurnLoop,
} from 'llm-runtime';
const result = await runTurnLoop({
initialState,
emptyTextRetryLimit: 0,
rejectedTextRetryLimit: 1,
requiresActionEvidence: ({ state }) => state.awaitingVerifiedAction,
buildMessages: async ({ state, transientInstruction }) => {
if (!transientInstruction) {
return state.messages;
}
return [...state.messages, { role: 'system', content: transientInstruction }];
},
onRejectedTextResponse: async ({ state, responseText, classification }) => ({
state: {
...state,
rejected: [...state.rejected, { classification, responseText }],
},
next: {
control: 'continue',
transientInstruction: DEFAULT_INTENT_ONLY_NARRATION_RECOVERY_INSTRUCTION,
},
}),
onToolCallsResponse: async ({ state, response }) => {
const nextMessages = [...state.messages, response.assistantMessage];
for (const toolCall of response.tool_calls ?? []) {
const toolResult = await executeTool(toolCall);
const content = JSON.stringify(toolResult);
const validationArtifact = parseToolValidationFailureArtifact(content);
nextMessages.push({
role: 'tool',
tool_call_id: toolCall.id,
content,
});
if (validationArtifact) {
return {
state: {
...state,
messages: nextMessages,
},
next: {
control: 'continue',
transientInstruction: DEFAULT_TOOL_VALIDATION_RECOVERY_INSTRUCTION,
},
};
}
}
return {
state: {
...state,
messages: nextMessages,
},
next: {
control: 'continue',
},
};
},
onTextResponse: async ({ state, response, responseText }) => ({
state: {
...state,
messages: [...state.messages, response.assistantMessage],
finalText: responseText,
awaitingVerifiedAction: false,
},
}),
});import { createLLMEnvironment, generate } from 'llm-runtime';
const environment = createLLMEnvironment({
providers: {
openai: {
apiKey: process.env.OPENAI_API_KEY!,
},
},
skillRoots: ['/app/skills', '/workspace/.codex/skills'],
defaults: {
reasoningEffort: 'medium',
toolPermission: 'auto',
},
mcpConfig: {
servers: {
docs: {
command: 'node',
args: ['docs-server.js'],
transport: 'stdio',
},
},
},
});
const response = await generate({
environment,
provider: 'openai',
model: 'gpt-5',
messages: [
{
role: 'user',
content: 'Summarize the workspace and use tools when needed.',
},
],
workingDirectory: process.cwd(),
builtIns: {
read_file: true,
list_files: true,
load_skill: true,
},
});
console.log(response.content);Recommended integration pattern:
- Create one stable
environmentfor the harness. - Pass request-specific inputs per call.
- Inspect
environment.skillRegistryandenvironment.mcpRegistrywhen you need to debug discovered skills or MCP servers. - Update skill roots when the harness-level skill search path changes.
- Do not rebuild the environment just because request-local values like
messagesorworkingDirectorychanged.
Example registry inspection pattern:
import { createLLMEnvironment } from 'llm-runtime';
const environment = createLLMEnvironment();
const skills = await environment.skillRegistry.listSkills();
const servers = environment.mcpRegistry.listServers();
console.table(skills.map((skill) => ({
skillId: skill.skillId,
title: skill.title,
})));
console.table(servers.map((server) => ({
name: server.name,
transport: server.config.transport,
})));npm run buildcompiles the package intodist/npm run checkruns TypeScript without emitting filesnpm testruns the Vitest suite intests/llmnpm run test:watchruns the Vitest suite in watch modenpm run test:e2eruns the showcase script intests/e2e/llm-package-showcase.tsnpm run test:e2e:dry-runvalidates the showcase wiring without live provider callsnpm run test:e2e:turn-loopruns therunTurnLoop(...)showcase script intests/e2e/llm-turn-loop-showcase.tsnpm run test:e2e:turn-loop:dry-runvalidates the turn-loop showcase wiring without live provider callsnpm run test:e2e:hardeningruns deterministic end-to-end hardening coverage for narrated intent recovery and validation-failure correction without a live provider
Use npm run test:e2e:hardening for package-level regression coverage of turn-loop hardening. Use the showcase runners when you want to validate live provider integration and real tool-calling behavior.
The real showcase runners expect a repo-local .env file when using npm run test:e2e or npm run test:e2e:turn-loop.