Skip to content

Conversation

@bwalderman
Copy link
Collaborator

@bwalderman bwalderman commented Aug 7, 2025

This is a first pass at unifying Microsoft's Web Model Context and Google Script Tools explainers.

Summary of changes

  • Merged in explainer content from Script Tools API
    • Background and motivation
    • Goals and non-goals
    • Example use cases
    • Prior art
    • Images
  • Updated new Prior Art section with acknowledgement of pre-existing WebMCP from the MCP-B project
  • Split out API proposal into a separate document to keep size down
    • Currently contains Web Model Context API design and Stamp database walkthrough.

Follow-ups

The proposal.md still contains just Web Model Context API stuff, more or less verbatim. Still need to converge on a proposed API design that takes Script Tools, WebMCP (MCP-B), and other prior art into account. Keeping that out of this PR to avoid it becoming too large and since the API is still being discussed.

+@khushalsagar

@bwalderman bwalderman requested a review from khushalsagar August 7, 2025 22:06
@bwalderman bwalderman force-pushed the brwalder/merge-explainer-1 branch from ce1d0f8 to 6748764 Compare August 7, 2025 22:09
@khushalsagar khushalsagar requested a review from bokand August 7, 2025 22:16
@anssiko anssiko changed the title Merge Script Tools API and WebMC explainers [Part I] Merge Script Tools API and WebMCP explainers [Part I] Aug 8, 2025
Copy link
Member

@anssiko anssiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing off for the general direction with a non-blocking suggestion to include acknowledgments.

@leotlee leotlee self-requested a review August 8, 2025 18:10
@bwalderman bwalderman requested review from leotlee and sushraja-msft and removed request for leotlee August 8, 2025 18:11
Co-authored-by: Anssi Kostiainen <anssi.kostiainen@gmail.com>
docs/proposal.md Outdated

### Recommendation

A **hybrid** approach of both of the examples above is recommended as this would make it easy for web developers to get started adding tools to their page, while leaving open the possibility of manifest-based approaches in the future. To implement this hybrid approach, a `"toolcall"` event is dispatched on every incoming tool call _before_ executing the tool's `execute` function. The event handler can handle the tool call by calling the event's `preventDefault()` method, and then responding to the agent with `respondWith()` as shown above. If the event handle does not call `preventDefault()` then the browser's default behavior for tool calls will occur. The `execute` function for the requested tool is called. If a tool with the requested name does not exist, then the browser responds to the agent with an error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a deeper discussion for the API but curious as I'm reading this. I'm not following if the recommendation is for the json manifest based declaration or the provideContext API based (which includes the execute function).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #8 as I'm not convinced we need the declarative form at all.

If we do decide we want this, an alternative approach is to include the declarative part only as a listing for informational/indexing purposes but keep only the procedural API for actually calling the functions. That is, an agent will still need to load the page to use the tool even with the declarative form so force registration and calling to happen in the same way.

We could avoid having a separate, differently shaped, API between the two forms by always requiring agent.provideContext even for declaratively provided tools. If we're worried about duplication between the declarative/procedural forms we could keep a parsed version of the declarative registration available as an object, e.g.

// manifest.json: 
{
    "tools": [
        {
            "name": "add-todo",
            "description": "Add a new todo item to the list",
            ...
        }
    ]
}
// js
window.agent.provideContext({
  tools: [
    window.agent.manifestTools['add-todo'],
  ]
});

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we need a deeper discussion around the API. Keeping this doc as is for this PR so we can address this later. At the moment, proposal.md is just a temporary home for stuff that was moved out of explainer.md. I fully expect the API will change dramatically from what's written here, especially since the direction of prior art and the Script Tools API is leaning closer to something like a single defineTool call per tool.

Copy link
Collaborator

@bokand bokand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.

Still have to finish my pass but sending out a few collected nits - will finish tomorrow morning.

docs/proposal.md Outdated

When an agent that is connected to the page sends a tool call, the JavaScript callback is invoked, where the page can handle the tool call and respond to the agent. Simple applications can handle tool calls entirely in page script, but more complex applications may choose to delegate computationally heavy operations to workers and respond to the agent asynchronously.

Handling tool cools in the main thread with the option of delegating to workers serves a few purposes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered delegating to a real MCP server as well? This is one option we've considered as concern about "duplicated effort between this and MCP" was a commonly heard concern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not something we considered while writing our original proposal, but worth investigating further.


- Allows additional context different discovery mechanisms without rendering a page.

**Disadvantages:**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another disadvantage compared to the imperative form is that these cannot be context-dependent - i.e. the imperative form allows you to reset the set of available tools by calling provideContext. The declarative form is effectively "always available".

docs/proposal.md Outdated

### Recommendation

A **hybrid** approach of both of the examples above is recommended as this would make it easy for web developers to get started adding tools to their page, while leaving open the possibility of manifest-based approaches in the future. To implement this hybrid approach, a `"toolcall"` event is dispatched on every incoming tool call _before_ executing the tool's `execute` function. The event handler can handle the tool call by calling the event's `preventDefault()` method, and then responding to the agent with `respondWith()` as shown above. If the event handle does not call `preventDefault()` then the browser's default behavior for tool calls will occur. The `execute` function for the requested tool is called. If a tool with the requested name does not exist, then the browser responds to the agent with an error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #8 as I'm not convinced we need the declarative form at all.

If we do decide we want this, an alternative approach is to include the declarative part only as a listing for informational/indexing purposes but keep only the procedural API for actually calling the functions. That is, an agent will still need to load the page to use the tool even with the declarative form so force registration and calling to happen in the same way.

We could avoid having a separate, differently shaped, API between the two forms by always requiring agent.provideContext even for declaratively provided tools. If we're worried about duplication between the declarative/procedural forms we could keep a parsed version of the declarative registration available as an object, e.g.

// manifest.json: 
{
    "tools": [
        {
            "name": "add-todo",
            "description": "Add a new todo item to the list",
            ...
        }
    ]
}
// js
window.agent.provideContext({
  tools: [
    window.agent.manifestTools['add-todo'],
  ]
});

content: [
{
type: "text",
text: `Stamp "${name}" added successfully! The collection now contains ${stamps.length} stamps.`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to specify how this structured output looks. WDYT about including something like the recently addedoutputSchema in MCP?

No need to decide here but filed #9 to discuss.

return {
content: [
{
type: "text",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have y'all considered non text output? Thinking through examples this seems like it'd be very useful but I'm not sure yet how it'd look.

Copy link
Collaborator Author

@bwalderman bwalderman Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking each content item could be either a string or a Blob URI for images and audio. A Blob URI with a supported mimeType is returned to the browser, and that's exposed to the agent in a format that the agent expects (i.e. base64 embedded in JSON).


#### Use a worker

To improve the user experience and make it possible for the stamp application to handle a large number of tool calls without tying up the document's main thread, the web developer may choose to move the tool handling into a dedicated worker script. Handling tool calls in a worker keeps the UI responsive, and makes it possible to handle potentially long-running operations. For example, if the user asks an AI agent to add a list of hundreds of stamps from an external source such as a spreadsheet, this will result in hundreds of tool calls.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that could happen at the application level? Tools are already async so an author could just postMessage the real work over to the worker already. Registering tools directly in a worker seems like maybe a small ergonomics improvement for that use case.

One interesting but maybe scary idea was tools in a service worker which could allow tool usage without a browsing context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring tool registration to happen only in a top-level browsing context makes it easy to reason about which agent sees the tools (its the agent in the sidebar next to the tab). It also avoids potentially conflicting tool registrations from other frames and workers, and lets the browser enforce that the script registering the tools comes from the same origin. That was the main reason for not considering tool registration directly in workers.

An alternative to consider, we could define an object which holds the tools (like the AutomationDelegate from the Script Tools proposal). The object can only be created by a document in a top-level frame, but it can be transfered to workers via postMessage. So, for web devs who knows they want all of their tool handling to happen in a worker, they can just transfer the object to the worker and register the tools there. This allows the better ergonomics but still enforces the idea of one set of tools per tab.

@bwalderman bwalderman merged commit 58b2d5e into main Aug 13, 2025
1 check passed
@bwalderman bwalderman deleted the brwalder/merge-explainer-1 branch August 19, 2025 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants