-
Notifications
You must be signed in to change notification settings - Fork 135
Support cross-frame tool enumeration and composability while handling name collisions #160
Description
@Starborn @johannhof @domfarolino @bwalderman @victorhuangwq @48031j @khushalsagar
In a scenario where subframes can register and expose tools to other (parent) frames (discussed in #57), a tool enumeration may combine registrations from multiple modelContext objects, at least one per frame. While the API requires uniqueness of tool names within a single modelContext, it can't (and shouldn't) across frames.
Therefore, when tools are enumerated through listTools() or other mechanisms, they should be provided in a way that allows tools to be disambiguated even if a name collision occurs for a tool registration across modelContexts.
Just prefixing the frame origin is not sufficient, as there may be multiple frames with the same origin, as well as opaque origins that don't have an inherent string representation.
This is related to #51, as the primary use case is related to in-page-agents that want to use tools across fames. Browser agents also have this problem, but they can solve it by assigning internal document identifiers in ways that are opaque to developer.
Some high level approaches that come to mind.
listTools()is scoped to a singleModelContext, and it is up to the caller to disambiguate which tool comes from which source once it gets a list of tools, and compose the tool registries appropriately. (This assumes a mechanism for a parent frame to get access to aModelContextfrom an embedded frame, and vice versa).- Developer can provide a name for a particular instance of
modelContextwhich is used as a tool name prefix. (This may still result in name collisions that would need to get sorted though.) - Browser is responsible for assigning a unique tool name prefix for each
modelContext. listTools()returns not a flat list, but a hierarchical object representing how tool registrations map onto the document's frame tree.
The approach we want depends on a few factors, including the eventual API shapes for enumeration and composition of tool registries. I am leaning towards some combination of 1. and 2. because it provides the most control and composability for developers.