Tool Use: decouple execution and formalize function calls and responses

### Background

The initial [design](https://github.com/webmachinelearning/prompt-api/issues/7#issuecomment-2942909973) integrates JS tool execution within the prompt() call itself. The browser process operates as a hidden agent, looping on (model prediction → tool execution → response observation) until a final text response was ready.

While an agentic closed loop prioritizes API simplicity, it detracts from design goals to offer granular control and direct Language Model  interactions through this lower-level API.

## Motivation

We'd like to realign initial tool use integrations with Prompt API objectives:
- Provide essential Language Model tool types: i.e. Function Declarations (FD), Function Calls (FC), and Function Responses (FR).
  - These can be used by API clients to inspect, reconstruct, and test session history.
- Offer fine-grained client control over tool execution loops used for agentic integrations.
  - This enables clients to define the looping patterns, error handling, limits, etc.
- Align with patterns established by major LLM APIs (OpenAI, Gemini, Claude).
  - This empower clients to use the Prompt API more interchangeably with server-based APIs.

## Proposal

We propose a shift from an agentic, closed-loop model to an API-centric, open-loop model for tool execution in the Prompt API. The `prompt()` method should return a structured Function Call object to the client, which then manages the execution and Function Response feedback loop, maximizing developer control and observability. Function Declarations also need not provide execute functions for now.

The API should initially provide more granular functionally, as ease-of-use can be more readily provided by client libraries and future API enhancements.

| Design Principle | Closed-Loop (Original) | Open-Loop (Proposed) |
|---|---|---|
| Developer Control | Limited; tool execution is encapsulated; interventions are ad-hoc in tool bodies. | Full; client controls logic, guardrails, and error handling. |
| Debuggability | Poor; trajectory of tool calls is hidden; clients define bespoke call and response representations. | Excellent; Each tool call and response is a discrete step handled by the client. |
| Industry Alignment | Divergent; forces custom agent code. | Aligned; Improves portability of agentic code across client-side and server-side models. |

### Specific Changes
To enable the open-loop model, Function Call and Function Response must be formalized as first-class elements in the API message flow:

Riffing @FrankLi-MSFT 's prototype and @jingyun19's design, declaration would look like:
```
// The declaration for a tool that a language model can invoke.
dictionary LanguageModelToolDeclaration {
  required DOMString name;
  required DOMString description;
  // JSON schema for the input parameters.
  required object inputSchema;
  // Maybe add an optional/informational responseSchema per #137?
};

dictionary LanguageModelCreateCoreOptions {
  ....
  // Tools that the language model can use.
  sequence<LanguageModelTool> tools;
};
```

Then, tool calls and responses are made accessible to the API client. This definitely needs some workshopping!
```
// Represents a tool call requested by the language model.
dictionary LanguageModelToolCall {
  // Unique identifier for this tool call within the session.
  required DOMString callID;
  required DOMString name;
  // An object fitting the JSON input schema for the tool.
  object input;
};

// Represents the response from executing a tool call.
dictionary LanguageModelToolResponse {
  // Matches the callID from the corresponding tool call.
  required DOMString callID;
  // Response from tool execution. (using an object, rather than a string, per #138)
  // Maybe this should be a sequence of LanguageModelMessageContent?
  required object response;
  // Maybe add explicit success/error signals?
};

enum LanguageModelMessageType { "text", "image", "audio", "tool-call", "tool-response" };

typedef (
  ImageBitmapSource
  or AudioBuffer
  or HTMLAudioElement
  or BufferSource
  or DOMString
  or LanguageModelToolCall  // NEW
  or LanguageModelToolResponse  // NEW
) LanguageModelMessageValue;

// NEW: When `expectedOutputs` has non-text (e.g. `tool-call`), this now yields a content dictionary sequence.
Promise<DOMString or sequence<LanguageModelMessageContent>> prompt(
    LanguageModelPrompt input,
    optional LanguageModelPromptOptions options = {}
  );

// NEW: When `expectedOutputs` has non-text (e.g. `tool-call`), stream chunks are now content dictionaries.
ReadableStream promptStreaming(
  LanguageModelPrompt input,
  optional LanguageModelPromptOptions options = {}
);
```

We'll need to resolve a number of finer points, e.g.
- Behavior when supplying `responseConstraint` (maybe those requests won't yield tool calls?).
- Behavior when tools are declared but `expectedOutputs` does not contain `tool-call` (same or don't yield tool calls?).
- Whether LanguageModelMessageValue should be split to represent model inputs and outputs.
- Behavior when responses are not input immediately after tool calls (do warnings suffice, is prompt queueing a footgun?).
- Ability to specify tool call frequency or specific occurrences amid constrained responses.
- Probably a lot more!

In any case, the design aims provide the necessary low-level building blocks for developers to construct robust, observable, and fully customized tool use workflows. This proposal is still flexible and we're seeking design feedback!

Much credit for explorations should be given to:
- @FrankLi-MSFT  (leading initial [implementation design](https://docs.google.com/document/d/1ZY8i0FUv3wVK6-vzK2fNocSEVAKBxYSBJd6y3JVK178/edit?tab=t.0#heading=h.7nki9mck5t64) and prototyping)
- @jingyun19 (Exploring [design and scope revisions](https://docs.google.com/document/d/1Cyhk8X9jgpU4FFYQZKb8A5RgTQMqT9xIN-JnxlMdEus/edit?resourcekey=0-Date8jy3LWnhpwRzqJ-aDg&tab=t.0), welcome to Chrome!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tool Use: decouple execution and formalize function calls and responses #159

Background

Motivation

Proposal

Specific Changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design Principle	Closed-Loop (Original)	Open-Loop (Proposed)
Developer Control	Limited; tool execution is encapsulated; interventions are ad-hoc in tool bodies.	Full; client controls logic, guardrails, and error handling.
Debuggability	Poor; trajectory of tool calls is hidden; clients define bespoke call and response representations.	Excellent; Each tool call and response is a discrete step handled by the client.
Industry Alignment	Divergent; forces custom agent code.	Aligned; Improves portability of agentic code across client-side and server-side models.

Tool Use: decouple execution and formalize function calls and responses #159

Description

Background

Motivation

Proposal

Specific Changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions