Skip to content

Support cross-frame tool enumeration and composability while handling name collisions #160

@markafoltz

Description

@markafoltz

@Starborn @johannhof @domfarolino @bwalderman @victorhuangwq @48031j @khushalsagar

In a scenario where subframes can register and expose tools to other (parent) frames (discussed in #57), a tool enumeration may combine registrations from multiple modelContext objects, at least one per frame. While the API requires uniqueness of tool names within a single modelContext, it can't (and shouldn't) across frames.

Therefore, when tools are enumerated through listTools() or other mechanisms, they should be provided in a way that allows tools to be disambiguated even if a name collision occurs for a tool registration across modelContexts.

Just prefixing the frame origin is not sufficient, as there may be multiple frames with the same origin, as well as opaque origins that don't have an inherent string representation.

This is related to #51, as the primary use case is related to in-page-agents that want to use tools across fames. Browser agents also have this problem, but they can solve it by assigning internal document identifiers in ways that are opaque to developer.

Some high level approaches that come to mind.

  1. listTools() is scoped to a single ModelContext, and it is up to the caller to disambiguate which tool comes from which source once it gets a list of tools, and compose the tool registries appropriately. (This assumes a mechanism for a parent frame to get access to a ModelContext from an embedded frame, and vice versa).
  2. Developer can provide a name for a particular instance of modelContext which is used as a tool name prefix. (This may still result in name collisions that would need to get sorted though.)
  3. Browser is responsible for assigning a unique tool name prefix for each modelContext.
  4. listTools() returns not a flat list, but a hierarchical object representing how tool registrations map onto the document's frame tree.

The approach we want depends on a few factors, including the eventual API shapes for enumeration and composition of tool registries. I am leaning towards some combination of 1. and 2. because it provides the most control and composability for developers.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions