Skip to content

yysun/llm-runtime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-runtime

llm-runtime is a runtime layer for application-owned LLM workflows. It wraps provider invocation with one package boundary for tool orchestration, MCP integration, and skill loading.

This package is designed for harnesses that want a stable per-call API without pushing provider-specific details, built-in tool contracts, MCP wiring, and skill discovery into application code.

It is extracted from the Agent World to be a standalone dependency for any application that needs provider calls with tool calls, MCP and agent skill support to build your own full agent orchestration or harness.

Installation

npm install llm-runtime

The published package targets Node.js 18 and later and exposes a single root entrypoint.

What This Package Owns

  • Provider dispatch for generate(...) and stream(...)
  • Generic host-agnostic turn-loop orchestration through runTurnLoop(...)
  • Intrinsic turn-loop safety limits, stop semantics, trace summaries, and lifecycle hooks
  • Built-in tools such as file access, shell execution, and skill loading
  • MCP tool discovery and execution
  • Skill discovery from configured skill roots
  • Stable environment-level registries for MCP servers and skills
  • Cleanup boundaries for explicit environments and convenience-path caches

Public API

  • createLLMEnvironment(...)
  • disposeLLMEnvironment(...)
  • disposeLLMRuntimeCaches()
  • generate(...)
  • stream(...)
  • runTurnLoop(...)

The package is per-call first. You can call generate(...) or stream(...) directly, or inject an explicit environment when your harness wants stable provider, MCP, and skill dependencies.

Cleanup

Use the public cleanup APIs when the runtime owns MCP clients or cached tool-discovery state:

  • disposeLLMEnvironment(environment) shuts down the environment MCP registry only when that registry was created by the runtime.
  • disposeLLMRuntimeCaches() shuts down cached convenience-path MCP registries and clears cached provider, MCP, and skill discovery state.

Ownership is split deliberately:

  • The runtime owns cleanup for environments created for runtime use and for the convenience-path caches it creates internally.
  • The harness still owns temporary workspaces, transcript persistence, any caller-injected registries, and any other non-runtime resources attached to its application.

Mental Model

The main rule is simple:

  • Stable harness state belongs in environment
  • Request-specific state stays per call

Put This In environment

  • Provider configuration store
  • MCP registry or MCP config
  • Skill registry or skill roots
  • Default reasoningEffort
  • Default toolPermission

Keep This Per Call

  • provider
  • model
  • messages
  • workingDirectory
  • reasoningEffort
  • toolPermission
  • abortSignal

If a value should change from one request or UI action to the next, it usually should not live in the environment.

Tool Model

llm-runtime merges several tool sources into one callable surface.

Built-In Tools

The minimal runtime core does not require any built-in operational tools. The built-ins below are optional package-owned convenience capabilities exposed from the same package surface.

The package currently reserves these built-in names:

  • shell_cmd
  • load_skill
  • human_intervention_request
  • web_fetch
  • read_file
  • write_file
  • list_files
  • grep

Built-ins are package-owned and reserved. Application code can disable or narrow them, but should not redefine them.

Extra Tools

Extra tools are application-specific additions such as lookup_customer or create_ticket. They are additive only and cannot override reserved built-in names.

MCP Tools

MCP tools come from configured external servers. The runtime discovers them, namespaces them, and merges them into the same resolved tool set as built-ins and extra tools.

Skills

Skills are reusable instruction assets discovered from skill roots and loaded through load_skill. Skills are not executable tools; they add instruction context for the model.

generate(...) vs stream(...)

Both APIs share the same runtime model:

  • same provider config shape
  • same tool orchestration
  • same MCP and skill semantics

The difference is output delivery:

  • generate(...) returns the final result
  • stream(...) emits chunks and still returns the final result at the end

runTurnLoop(...)

runTurnLoop(...) is the package-owned iterative loop for harnesses that want the package to manage repeated model turns without taking ownership of harness state, persistence, or tool policy.

Use it when your harness needs more control than a single generate(...) or stream(...) call, but still wants one package boundary for:

  • repeated model invocation
  • empty-text retry handling
  • optional plain-text tool-intent normalization
  • hard iteration, tool-round, repeated-call, and wall-clock safety bounds
  • structured trace summaries and lifecycle hooks

The split of responsibilities is deliberate:

  • The package owns loop repetition, hard-stop safety checks, response normalization, trace collection, and lifecycle hook ordering.
  • The harness owns state shape, message construction, tool execution, persistence, replay, and business-specific completion policy.

Safety And Stop Reasons

runTurnLoop(...) now applies intrinsic package defaults for:

  • maxIterations
  • maxConsecutiveToolTurns
  • maxWallTimeMs
  • repeated identical tool-call suppression through repeatedToolCallGuard

Terminal reasons are stable string literals suitable for harness branching:

  • text_response
  • tool_calls_response
  • empty_text_stop
  • rejected_text_response
  • unhandled_response
  • max_iterations_exceeded
  • max_tool_rounds_exceeded
  • timeout
  • repeated_tool_call_stopped

The final result keeps state, response, and reason, and also includes:

  • steps
  • toolCalls
  • classifications
  • retries
  • stop
  • elapsedMs

If the loop times out before any model response is available, result.response is null and result.stop carries the timeout detail.

Lifecycle Hooks

Use these additive hooks for tracing and metrics:

  • onIterationStart(...)
  • onModelResponse(...)
  • onClassification(...)
  • onStop(...)

They do not replace onTextResponse(...), onToolCallsResponse(...), or the other branch callbacks that still own state updates.

Turn-Loop Hardening

For tool-capable turns, runTurnLoop(...) can now reject intent-only narration such as "I will check the file" when the harness still requires action evidence.

Use these hooks when your harness needs hardening against weak tool users:

  • requiresActionEvidence(...) tells the package whether a non-empty text reply still needs proof of action before it can be accepted as final.
  • classifyTextResponse(...) lets the harness override package defaults and explicitly classify replies as verified_final_response, intent_only_narration, or non_progressing.
  • onRejectedTextResponse(...) lets the harness persist rejected narration or other non-progressing text before retrying or stopping.
  • rejectedTextRetryLimit bounds how many rejected text retries the package should allow before returning rejected_text_response instead of false success.

The package also exports reusable recovery helpers:

  • DEFAULT_INTENT_ONLY_NARRATION_RECOVERY_INSTRUCTION
  • DEFAULT_NON_PROGRESSING_TEXT_RECOVERY_INSTRUCTION
  • DEFAULT_TOOL_VALIDATION_RECOVERY_INSTRUCTION

These are default exported strings, not mutable runtime settings. A harness should treat them as convenient starting points and override the effective recovery text by returning its own transientInstruction from onRejectedTextResponse(...), by returning a custom assessment from classifyTextResponse(...), or by supplying its own validation-recovery instruction after parsing a validation artifact.

Tool validation failures now return durable JSON artifacts instead of opaque error strings. Use parseToolValidationFailureArtifact(...) when the harness wants to detect a validation failure from a tool result and prompt the model to emit a corrected tool call.

Synthetic Tool Calls

When parsePlainTextToolIntent(...) converts a text response into a tool-call response, you can opt in to synthetic marking with markSyntheticToolCalls: true.

When enabled:

  • generated tool_calls entries include synthetic: true
  • mirrored assistant-message tool calls include the same marker
  • result.toolCalls summaries expose the normalized call source and synthetic status

When disabled, plain-text normalization still works, but the public tool-call surface is unchanged.

The boundary remains the same: the package can classify and reject narration, but the harness still owns the policy for when a reply is truly verified and how bounded recovery should be persisted.

You can provide either:

  • modelRequest when the package should call generate(...) or stream(...) for you
  • callModel when the harness wants to control model invocation directly

Minimal shape:

import { runTurnLoop, type LLMChatMessage } from 'llm-runtime';

type ChatState = {
  messages: LLMChatMessage[];
  finalText: string;
};

const result = await runTurnLoop({
  initialState: {
    messages: [{ role: 'user', content: 'Find the token and use tools if needed.' }],
    finalText: '',
  },
  emptyTextRetryLimit: 0,
  modelRequest: {
    provider: 'openai',
    model: 'gpt-5',
    builtIns: {
      read_file: true,
    },
  },
  buildMessages: async ({ state, transientInstruction }) => {
    if (!transientInstruction) {
      return state.messages;
    }

    return [
      ...state.messages,
      { role: 'system', content: transientInstruction },
    ];
  },
  onToolCallsResponse: async ({ state, response }) => {
    const nextMessages = [...state.messages, response.assistantMessage];

    for (const toolCall of response.tool_calls ?? []) {
      const toolResult = await executeTool(toolCall);
      nextMessages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content: JSON.stringify(toolResult),
      });
    }

    return {
      state: {
        ...state,
        messages: nextMessages,
      },
      next: {
        control: 'continue',
      },
    };
  },
  onTextResponse: async ({ state, responseText, response }) => ({
    state: {
      ...state,
      messages: [...state.messages, response.assistantMessage],
      finalText: responseText,
    },
  }),
});

console.log(result.state.finalText);

Hardening-oriented shape:

import {
  DEFAULT_INTENT_ONLY_NARRATION_RECOVERY_INSTRUCTION,
  DEFAULT_TOOL_VALIDATION_RECOVERY_INSTRUCTION,
  parseToolValidationFailureArtifact,
  runTurnLoop,
} from 'llm-runtime';

const result = await runTurnLoop({
  initialState,
  emptyTextRetryLimit: 0,
  rejectedTextRetryLimit: 1,
  requiresActionEvidence: ({ state }) => state.awaitingVerifiedAction,
  buildMessages: async ({ state, transientInstruction }) => {
    if (!transientInstruction) {
      return state.messages;
    }

    return [...state.messages, { role: 'system', content: transientInstruction }];
  },
  onRejectedTextResponse: async ({ state, responseText, classification }) => ({
    state: {
      ...state,
      rejected: [...state.rejected, { classification, responseText }],
    },
    next: {
      control: 'continue',
      transientInstruction: DEFAULT_INTENT_ONLY_NARRATION_RECOVERY_INSTRUCTION,
    },
  }),
  onToolCallsResponse: async ({ state, response }) => {
    const nextMessages = [...state.messages, response.assistantMessage];

    for (const toolCall of response.tool_calls ?? []) {
      const toolResult = await executeTool(toolCall);
      const content = JSON.stringify(toolResult);
      const validationArtifact = parseToolValidationFailureArtifact(content);

      nextMessages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content,
      });

      if (validationArtifact) {
        return {
          state: {
            ...state,
            messages: nextMessages,
          },
          next: {
            control: 'continue',
            transientInstruction: DEFAULT_TOOL_VALIDATION_RECOVERY_INSTRUCTION,
          },
        };
      }
    }

    return {
      state: {
        ...state,
        messages: nextMessages,
      },
      next: {
        control: 'continue',
      },
    };
  },
  onTextResponse: async ({ state, response, responseText }) => ({
    state: {
      ...state,
      messages: [...state.messages, response.assistantMessage],
      finalText: responseText,
      awaitingVerifiedAction: false,
    },
  }),
});

Example

import { createLLMEnvironment, generate } from 'llm-runtime';

const environment = createLLMEnvironment({
  providers: {
    openai: {
      apiKey: process.env.OPENAI_API_KEY!,
    },
  },
  skillRoots: ['/app/skills', '/workspace/.codex/skills'],
  defaults: {
    reasoningEffort: 'medium',
    toolPermission: 'auto',
  },
  mcpConfig: {
    servers: {
      docs: {
        command: 'node',
        args: ['docs-server.js'],
        transport: 'stdio',
      },
    },
  },
});

const response = await generate({
  environment,
  provider: 'openai',
  model: 'gpt-5',
  messages: [
    {
      role: 'user',
      content: 'Summarize the workspace and use tools when needed.',
    },
  ],
  workingDirectory: process.cwd(),
  builtIns: {
    read_file: true,
    list_files: true,
    load_skill: true,
  },
});

console.log(response.content);

Harness Guidance

Recommended integration pattern:

  1. Create one stable environment for the harness.
  2. Pass request-specific inputs per call.
  3. Inspect environment.skillRegistry and environment.mcpRegistry when you need to debug discovered skills or MCP servers.
  4. Update skill roots when the harness-level skill search path changes.
  5. Do not rebuild the environment just because request-local values like messages or workingDirectory changed.

Example registry inspection pattern:

import { createLLMEnvironment } from 'llm-runtime';

const environment = createLLMEnvironment();

const skills = await environment.skillRegistry.listSkills();
const servers = environment.mcpRegistry.listServers();

console.table(skills.map((skill) => ({
  skillId: skill.skillId,
  title: skill.title,
})));

console.table(servers.map((server) => ({
  name: server.name,
  transport: server.config.transport,
})));

Local Development

  • npm run build compiles the package into dist/
  • npm run check runs TypeScript without emitting files
  • npm test runs the Vitest suite in tests/llm
  • npm run test:watch runs the Vitest suite in watch mode
  • npm run test:e2e runs the showcase script in tests/e2e/llm-package-showcase.ts
  • npm run test:e2e:dry-run validates the showcase wiring without live provider calls
  • npm run test:e2e:turn-loop runs the runTurnLoop(...) showcase script in tests/e2e/llm-turn-loop-showcase.ts
  • npm run test:e2e:turn-loop:dry-run validates the turn-loop showcase wiring without live provider calls
  • npm run test:e2e:hardening runs deterministic end-to-end hardening coverage for narrated intent recovery and validation-failure correction without a live provider

Use npm run test:e2e:hardening for package-level regression coverage of turn-loop hardening. Use the showcase runners when you want to validate live provider integration and real tool-calling behavior.

The real showcase runners expect a repo-local .env file when using npm run test:e2e or npm run test:e2e:turn-loop.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors