Merge Script Tools API and WebMCP explainers [Part I] #3

bwalderman · 2025-08-07T22:06:07Z

This is a first pass at unifying Microsoft's Web Model Context and Google Script Tools explainers.

Summary of changes

Merged in explainer content from Script Tools API
- Background and motivation
- Goals and non-goals
- Example use cases
- Prior art
- Images
Updated new Prior Art section with acknowledgement of pre-existing WebMCP from the MCP-B project
Split out API proposal into a separate document to keep size down
- Currently contains Web Model Context API design and Stamp database walkthrough.

Follow-ups

The proposal.md still contains just Web Model Context API stuff, more or less verbatim. Still need to converge on a proposed API design that takes Script Tools, WebMCP (MCP-B), and other prior art into account. Keeping that out of this PR to avoid it becoming too large and since the API is still being discussed.

+@khushalsagar

anssiko

Signing off for the general direction with a non-blocking suggestion to include acknowledgments.

docs/proposal.md

Co-authored-by: Anssi Kostiainen <anssi.kostiainen@gmail.com>

docs/explainer.md

docs/proposal.md

khushalsagar · 2025-08-11T21:54:40Z

docs/proposal.md

+
+### Recommendation
+
+A **hybrid** approach of both of the examples above is recommended as this would make it easy for web developers to get started adding tools to their page, while leaving open the possibility of manifest-based approaches in the future. To implement this hybrid approach, a `"toolcall"` event is dispatched on every incoming tool call _before_ executing the tool's `execute` function. The event handler can handle the tool call by calling the event's `preventDefault()` method, and then responding to the agent with `respondWith()` as shown above. If the event handle does not call `preventDefault()` then the browser's default behavior for tool calls will occur. The `execute` function for the requested tool is called. If a tool with the requested name does not exist, then the browser responds to the agent with an error.


We need a deeper discussion for the API but curious as I'm reading this. I'm not following if the recommendation is for the json manifest based declaration or the provideContext API based (which includes the execute function).

I filed #8 as I'm not convinced we need the declarative form at all.

If we do decide we want this, an alternative approach is to include the declarative part only as a listing for informational/indexing purposes but keep only the procedural API for actually calling the functions. That is, an agent will still need to load the page to use the tool even with the declarative form so force registration and calling to happen in the same way.

We could avoid having a separate, differently shaped, API between the two forms by always requiring agent.provideContext even for declaratively provided tools. If we're worried about duplication between the declarative/procedural forms we could keep a parsed version of the declarative registration available as an object, e.g.

// manifest.json: { "tools": [ { "name": "add-todo", "description": "Add a new todo item to the list", ... } ] }

// js window.agent.provideContext({ tools: [ window.agent.manifestTools['add-todo'], ] });

Agree that we need a deeper discussion around the API. Keeping this doc as is for this PR so we can address this later. At the moment, proposal.md is just a temporary home for stuff that was moved out of explainer.md. I fully expect the API will change dramatically from what's written here, especially since the direction of prior art and the Script Tools API is leaning closer to something like a single defineTool call per tool.

bokand

Sorry for the delay.

Still have to finish my pass but sending out a few collected nits - will finish tomorrow morning.

docs/proposal.md

bokand · 2025-08-11T21:57:58Z

docs/proposal.md

+
+When an agent that is connected to the page sends a tool call, the JavaScript callback is invoked, where the page can handle the tool call and respond to the agent. Simple applications can handle tool calls entirely in page script, but more complex applications may choose to delegate computationally heavy operations to workers and respond to the agent asynchronously.
+
+Handling tool cools in the main thread with the option of delegating to workers serves a few purposes:


Have you considered delegating to a real MCP server as well? This is one option we've considered as concern about "duplicated effort between this and MCP" was a commonly heard concern.

That's not something we considered while writing our original proposal, but worth investigating further.

docs/proposal.md

bokand · 2025-08-12T12:35:22Z

docs/proposal.md

+
+- Allows additional context different discovery mechanisms without rendering a page.
+
+**Disadvantages:**


Another disadvantage compared to the imperative form is that these cannot be context-dependent - i.e. the imperative form allows you to reset the set of available tools by calling provideContext. The declarative form is effectively "always available".

docs/proposal.md

bokand · 2025-08-12T13:02:37Z

docs/proposal.md

+
+### Recommendation
+
+A **hybrid** approach of both of the examples above is recommended as this would make it easy for web developers to get started adding tools to their page, while leaving open the possibility of manifest-based approaches in the future. To implement this hybrid approach, a `"toolcall"` event is dispatched on every incoming tool call _before_ executing the tool's `execute` function. The event handler can handle the tool call by calling the event's `preventDefault()` method, and then responding to the agent with `respondWith()` as shown above. If the event handle does not call `preventDefault()` then the browser's default behavior for tool calls will occur. The `execute` function for the requested tool is called. If a tool with the requested name does not exist, then the browser responds to the agent with an error.


I filed #8 as I'm not convinced we need the declarative form at all.

If we do decide we want this, an alternative approach is to include the declarative part only as a listing for informational/indexing purposes but keep only the procedural API for actually calling the functions. That is, an agent will still need to load the page to use the tool even with the declarative form so force registration and calling to happen in the same way.

We could avoid having a separate, differently shaped, API between the two forms by always requiring agent.provideContext even for declaratively provided tools. If we're worried about duplication between the declarative/procedural forms we could keep a parsed version of the declarative registration available as an object, e.g.

// manifest.json: { "tools": [ { "name": "add-todo", "description": "Add a new todo item to the list", ... } ] }

// js window.agent.provideContext({ tools: [ window.agent.manifestTools['add-todo'], ] });

bokand · 2025-08-12T13:13:31Z

docs/proposal.md

+        content: [
+            {
+                type: "text",
+                text: `Stamp "${name}" added successfully! The collection now contains ${stamps.length} stamps.`,


We'll need to specify how this structured output looks. WDYT about including something like the recently addedoutputSchema in MCP?

No need to decide here but filed #9 to discuss.

bokand · 2025-08-12T13:14:21Z

docs/proposal.md

+    return {
+        content: [
+            {
+                type: "text",


Have y'all considered non text output? Thinking through examples this seems like it'd be very useful but I'm not sure yet how it'd look.

I was thinking each content item could be either a string or a Blob URI for images and audio. A Blob URI with a supported mimeType is returned to the browser, and that's exposed to the agent in a format that the agent expects (i.e. base64 embedded in JSON).

bokand · 2025-08-12T13:16:23Z

docs/proposal.md

+
+#### Use a worker
+
+To improve the user experience and make it possible for the stamp application to handle a large number of tool calls without tying up the document's main thread, the web developer may choose to move the tool handling into a dedicated worker script. Handling tool calls in a worker keeps the UI responsive, and makes it possible to handle potentially long-running operations. For example, if the user asks an AI agent to add a list of hundreds of stamps from an external source such as a spreadsheet, this will result in hundreds of tool calls.


Is this something that could happen at the application level? Tools are already async so an author could just postMessage the real work over to the worker already. Registering tools directly in a worker seems like maybe a small ergonomics improvement for that use case.

One interesting but maybe scary idea was tools in a service worker which could allow tool usage without a browsing context.

Requiring tool registration to happen only in a top-level browsing context makes it easy to reason about which agent sees the tools (its the agent in the sidebar next to the tab). It also avoids potentially conflicting tool registrations from other frames and workers, and lets the browser enforce that the script registering the tools comes from the same origin. That was the main reason for not considering tool registration directly in workers.

An alternative to consider, we could define an object which holds the tools (like the AutomationDelegate from the Script Tools proposal). The object can only be created by a document in a top-level frame, but it can be transfered to workers via postMessage. So, for web devs who knows they want all of their tool handling to happen in a worker, they can just transfer the object to the worker and register the tools there. This allows the better ergonomics but still enforces the idea of one set of tools per tab.

docs/proposal.md

bwalderman requested a review from khushalsagar August 7, 2025 22:06

bwalderman added 2 commits August 7, 2025 15:07

Merge Script Tools API and WebMC explainers.

9de1d71

Rename

6748764

bwalderman force-pushed the brwalder/merge-explainer-1 branch from ce1d0f8 to 6748764 Compare August 7, 2025 22:09

khushalsagar requested a review from bokand August 7, 2025 22:16

anssiko changed the title ~~Merge Script Tools API and WebMC explainers [Part I]~~ Merge Script Tools API and WebMCP explainers [Part I] Aug 8, 2025

anssiko mentioned this pull request Aug 8, 2025

Bikeshed naming for the overall API #2

Closed

anssiko approved these changes Aug 8, 2025

View reviewed changes

docs/proposal.md Outdated Show resolved Hide resolved

leotlee self-requested a review August 8, 2025 18:10

bwalderman requested review from leotlee and sushraja-msft and removed request for leotlee August 8, 2025 18:11

Update docs/proposal.md

1e018e7

Co-authored-by: Anssi Kostiainen <anssi.kostiainen@gmail.com>

khushalsagar approved these changes Aug 11, 2025

View reviewed changes

bokand reviewed Aug 11, 2025

View reviewed changes

anssiko reviewed Aug 12, 2025

View reviewed changes

docs/proposal.md Outdated Show resolved Hide resolved

bokand approved these changes Aug 12, 2025

View reviewed changes

jasonjmcghee reviewed Aug 13, 2025

View reviewed changes

docs/proposal.md Outdated Show resolved Hide resolved

Address feedback.

446069b

bwalderman merged commit 58b2d5e into main Aug 13, 2025
1 check passed

bwalderman deleted the brwalder/merge-explainer-1 branch August 19, 2025 21:06


		### Recommendation

		A hybrid approach of both of the examples above is recommended as this would make it easy for web developers to get started adding tools to their page, while leaving open the possibility of manifest-based approaches in the future. To implement this hybrid approach, a `"toolcall"` event is dispatched on every incoming tool call _before_ executing the tool's `execute` function. The event handler can handle the tool call by calling the event's `preventDefault()` method, and then responding to the agent with `respondWith()` as shown above. If the event handle does not call `preventDefault()` then the browser's default behavior for tool calls will occur. The `execute` function for the requested tool is called. If a tool with the requested name does not exist, then the browser responds to the agent with an error.


		When an agent that is connected to the page sends a tool call, the JavaScript callback is invoked, where the page can handle the tool call and respond to the agent. Simple applications can handle tool calls entirely in page script, but more complex applications may choose to delegate computationally heavy operations to workers and respond to the agent asynchronously.

		Handling tool cools in the main thread with the option of delegating to workers serves a few purposes:


		- Allows additional context different discovery mechanisms without rendering a page.

		Disadvantages:


		#### Use a worker

		To improve the user experience and make it possible for the stamp application to handle a large number of tool calls without tying up the document's main thread, the web developer may choose to move the tool handling into a dedicated worker script. Handling tool calls in a worker keeps the UI responsive, and makes it possible to handle potentially long-running operations. For example, if the user asks an AI agent to add a list of hundreds of stamps from an external source such as a spreadsheet, this will result in hundreds of tool calls.

Merge Script Tools API and WebMCP explainers [Part I] #3

Merge Script Tools API and WebMCP explainers [Part I] #3

Uh oh!

Conversation

bwalderman commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Follow-ups

Uh oh!

anssiko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bokand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwalderman Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bwalderman commented Aug 7, 2025 •

edited

Loading

bwalderman Aug 13, 2025 •

edited

Loading