-
Notifications
You must be signed in to change notification settings - Fork 84
Description
Goal
revamp the initial tool use design that integrates JS tool execution within the prompt() call itself, and list out detailed API behaviors.
Background & Motivation
For tool use, we already aligned on an open loop tool use design which allows clients to handle a tool call. It would be convenient if the API can handle automatic execution sometimes, so we'd like to propose an API that's compatible with the open loop design, so that clients can switch between depending on use cases.
The initial design omit a lot of details of API shape and behavior, so this issue tries to address them.
That said, there are still many use cases where open-loop API behavior is preferred. Here lists some examples
Proposal
API Spec
Everything added for open loop API spec can be reuse. i.e, closed loop mode still supports appending tool-call and tool-response objects (more on this below)
Extra spec for automatic execution mode:
dictionary LanguageModelCreateCoreOptions {
//..... existing fields
// Tool now also contains an `execute` function
sequence<LanguageModelToolDeclaration> tools;
AutomaticToolUseConfig tool_use_config;
}
dictionary AutomaticToolUseConfig {
bool enabled;
// Max number of tool execution within a prompt()/promptStreaming() call. When this number of tool is reached, throw error "Max number of tool call reached". Batch/Parallel tool executions are also counted as-is.
int max_tool_calls;
}
Return types of prompt() and promptStreaming() will always be DOMString.
Various API Behaviors
Tool Execution
- Sequential Tool execution will always be blocking. i.e, model waits for all tool results before running the next decode.
- Batch tools will be parallel. I.e, if model decodes “tool-call-1”, “tool-call-2”, their execute function will run in parallel. Planner will loop wait for all Promise to resolve to start the next decode.
Append & Generate
- Appending tool calls and tool responses is still supported, but "tool-call" and "tool-result" must be appended together in pairs.
- At the time of prompt() / promptStreaming(), API throws error if there’s a "tool-call" not followed by "tool-response" in the argument. It’s okay to prompt() with [tool-call, tool-result]. The model will continue decode with future tool call handled by API automatically.
Streaming vs. Unary
- Consider this example model generation sequence: [response1, tool1, tool2, response2]
- For prompt(), the promise waits until response2 is generated to resolve
- For promptStreaming(), client will first get parts of response1 from the ReadableStream, then the client will wait for tool execution to complete, and finally gets parts of response2.
Tool’s Error handling
- Issue: If tool impl throws error, should the planner loop stops (and the session is no-longer usable because it is in an undefined state)? Or continue with the error message?
- Proposal: Continue planner loop using error.message as tool output
Constraint Decoding
- Issue: it's unclear whether the constraint is applied for each decoding step in the planner loop, or only the first step. It's very easy to be misused
- Proposal: do not support setting
responseConstraintandprefixif using closed loop mode. Suggest using open loop api if callers need it.
Other Aspects
Observability
- Issue: Closed loop API needs to provide some observability. E.g, after-the-fact traces.
- Proposal: expose a new
session.history()function which returnsPromise<sequence<LanguageModelMessageContent>>. It returns all messages for all roles (incl. Initial prompts) so far appended & generated in the session.
Much credit to @FrankLi-MSFT 's initial closed loop implementation design + prototyping + alignment on open loop design.