From 9de1d71c0f22697c42897152a62b1c8bc87370d2 Mon Sep 17 00:00:00 2001 From: Brandon Walderman Date: Thu, 7 Aug 2025 14:55:46 -0700 Subject: [PATCH 1/4] Merge Script Tools API and WebMC explainers. --- content/explainer_mcp.svg | 1 + content/explainer_st.svg | 1 + docs/proposal.md | 258 ++++++++++++++++ docs/webmcp.md | 609 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 869 insertions(+) create mode 100644 content/explainer_mcp.svg create mode 100644 content/explainer_st.svg create mode 100644 docs/proposal.md create mode 100644 docs/webmcp.md diff --git a/content/explainer_mcp.svg b/content/explainer_mcp.svg new file mode 100644 index 0000000..406802b --- /dev/null +++ b/content/explainer_mcp.svg @@ -0,0 +1 @@ + diff --git a/content/explainer_st.svg b/content/explainer_st.svg new file mode 100644 index 0000000..f441f35 --- /dev/null +++ b/content/explainer_st.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/proposal.md b/docs/proposal.md new file mode 100644 index 0000000..0998b12 --- /dev/null +++ b/docs/proposal.md @@ -0,0 +1,258 @@ +# WebMCP API Proposal + +## Definitions + +- **Model context provider**: A single top-level browsing context navigated to a page that uses the WebMCP API to provide context (i.e. tools) to agents. +- **Agent**: An application that uses the provided context. This may be something like an AI assistant integrated into the browser, or possibly a native/desktop application. + +## Understanding WebMCP + +Only a top-level browsing context, such as a browser tab can be a model context provider. A page calls the WebMCP API's methods to register tools with the browser. An agent requires some information from the tool in order to use it. A simple, common subset emerges from [existing AI integration APIs](explainer.md#prior-art): + +* A natural language description of the tool / function +* For each parameter: + * A natural language description of the parameter + * The expected type (e.g. Number, String, Enum, etc) + * Any restrictions on the parameter (e.g. integers greater than 0) +* A JS callback function that implementings the tool and returns a result + +When an agent that is connected to the page sends a tool call, the JavaScript callback is invoked, where the page can handle the tool call and respond to the agent. Simple applications can handle tool calls entirely in page script, but more complex applications may choose to delegate computationally heavy operations to workers and respond to the agent asynchronously. + +Handling tool cools in the main thread with the option of delegating to workers serves a few purposes: + +- Ensures tool calls run one at a time and sequentially. +- The page can update UI to reflect state changes performed by tools. +- Handling tool calls in page script may be sufficient for simple applications. + +## Benefits of this design + +- **Familiar language/tools**: Lets a web developer implement their tools in JavaScript. +- **Code reuse**: A web developer may only need to make minimal changes to expose existing functionality as tools if their page already has an appropriate JavaScript function. +- **Local tool call handling**: Enables web developers to integrate their pages with AI-based agents by working with, but not solely relying on, techniques like Model Context Protocol that require a separate server and authentication. A web developer may only need to maintain one codebase for their frontend UI and agent integration, improving maintainability and quality-of-life for the developer. Local handling also potentially reduces network calls and enhances privacy/security. +- **Fine-grained permissions**: Tool calls are mediated through the browser, so the user has the opportunity to review the requesting client apps and provide consent. +- **Developer involvement**: Encourages developer involvement in the agentic web, required for a thriving web. Reduces the need for solutions like UI automation where the developer is not involved, improving privacy, reducing site expenses, and a better customer experience. +- **Seamless integration**: Since tool calls are handled locally on a real browser, the agent can interleave these calls with human input when necessary (e.g. for consent, auth flows, dialogs, etc.). +- **Accessibility**: Bringing tools to webpages via may help users with accessibility needs by allowing them to complete the same job-to-be-done via agentic or conversational interfaces instead of relying on the accessibility tree, which many websites have not implemented. + +## Limitations of this design + +- **Browsing context required**: Since tool calls are handled in JavaScript, a browsing context (i.e. a browser tab or a webview) must be opened. There is currently no support for agents or assistive tools to call tools "headlessly" without visible browser UI. This is a future consideration which is discussed further below. +- **UI synchronization**: For a satisfactory end user experience, web developers need to ensure their UI is updated to reflect the current app state, regardless of whether the state updates came from human interaction or from a tool call. +- **Complexity overhead**: In cases where the site UI is very complex, developers will likely need to do some refactoring or add JavaScript that handles app and UI state with appropriate outputs. +- **Tool discoverability**: There is no built-in mechanism for client applications to discover which sites provide callable tools without visiting or querying them directly. Search engines, or directories of some kind may play a role in helping client applications determine whether a site has relevant tools for the task it is trying to perform. + +## API + +The `window.agent` interface is introduced to represent an abstract AI agent that is connected to the page and uses the page's context. The `agent` object has a single method `provideContext` that's used to update the context (currently just tools) available to the agent. The method takes an object with a `tools` property which is a list of tool descriptors. The tool descriptors look as shown in this example below, which aligns with the Prompt API's [tool use](https://github.com/webmachinelearning/prompt-api#tool-use) specification, and other libraries like the MCP SDK: + +```js +// Declare tool schema and implementation functions. +window.agent.provideContext({ + tools: [ + { + name: "add-todo", + description: "Add a new todo item to the list", + inputSchema: { + type: "object", + properties: { + text: { type: "string", description: "The text of the todo item" } + }, + required: ["text"] + }, + async execute({ text }) => { + // Add todo item and update UI. + return /* structured content response */ + } + } + ] +}); +``` + +The `provideContext` method can be called multiple times. Subsequent calls clear any pre-existing tools and other context before registering the new ones. This is useful for single-page web apps that frequently change UI state and could benefit from presenting different tools depending on which state the UI is currently in. + +**Advantages:** + +- Aligns with existing APIs. +- Simple for web developers to use. +- Enforces a single function per tool. + +**Disadvantages:** + +- Must navigate to the page and run JavaScript for agent to discover tools. + +If WebMCP gains traction in the web developer community, it will become important for agents to have a way to discover which sites have tools that are relevant to a user's request. Discovery is a topic that may warrant its own explainer, but suffice to say, it may be beneficial for agents to have a way to know what capabilities a page offers without having to navigate to the web site first. As an example, a future iteration of this feature could introduce declarative tools definitions that are placed in an app manifest so that agents would only need to fetch the manifest with a simple HTTP GET request. Agents will of course still need to navigate to the site to actually use its tools, but a manifest makes it far less costly to discover these tools and reason about their relevance to the user's task. + +To make such a scenario easier, it would be beneficial to support an alternate means of tool call execution; one that separates the tool defintion and schema (which may exist in an external manifest file) from the implementation function. + +One way to do this is to handle tool calls as events, as shown below: + +```json +// 1. manifest.json: Define tools declaratively. Exact syntax TBD. + +{ + // .. other manifest fields .. + "tools": [ + { + "name": "add-todo", + "description": "Add a new todo item to the list", + "inputSchema": { + "type": "object", + "properties": { + "text": { "type": "string", "description": "The text of the todo item" } + }, + "required": ["text"] + }, + } + ] +} +``` + +```js +// 2. script.js: Handle tool calls as events. + +window.agent.addEventListener('toolcall', async e => { + if (e.name === "add-todo") { + // Add todo item and update UI. + e.respondWith(/* structured content response */); + return; + } // etc... +}); +``` + +Tool calls are handled as events. Since event handler functions can't respond to the agent by returning a value directly, the `'toolcall'` event object has a `respondWith()` method that needs to be called to signal completion and respond to the agent. This is based on the existing service worker `'fetch'` event. + +**Advantages:** + +- Allows additional context different discovery mechanisms without rendering a page. + +**Disadvantages:** + +- Slightly harder to keep definition and implementation in sync. +- Potentially large switch-case in event handler. + +### Recommendation + +A **hybrid** approach of both of the examples above is recommended as this would make it easy for web developers to get started adding tools to their page, while leaving open the possibility of manifest-based approaches in the future. To implement this hybrid approach, a `"toolcall"` event is dispatched on every incoming tool call _before_ executing the tool's `execute` function. The event handler can handle the tool call by calling the event's `preventDefault()` method, and then responding to the agent with `respondWith()` as shown above. If the event handle does not call `preventDefault()` then the browser's default behavior for tool calls will occur. The `execute` function for the requested tool is called. If a tool with the requested name does not exist, then the browser responds to the agent with an error. + +## Example of WebMCP API usage + +Consider a web application like an example Historical Stamp Database. The complete source is available in the [example/](./example/index.html) folder alongside this explainer. + +Screenshot of Historical Stamp Database + +The page shows the stamps currently in the database and has a form to add a new stamp to the database. The author of this app is interested in leveraging the WebMCP API to enable agentic scenarios like: + +- Importing multiple stamps from outside data sources +- Back-filling missing images +- Populating/correcting descriptions with deep research +- Adding information to descriptions about rarity +- Allowing end users to engage in a conversational interface about the stamps on the site and use that information in agentic flows + +Using the WebMCP API, the author can add just a few simple tools to the page for adding, updating, and retrieving stamps. With these relatively simple tools, an AI agent would have the ability to perform complex tasks like the ones illustrated above on behalf of the user. + +The example below walks through adding one such tool, the "add-stamp" tool, using the WebMCP API, so that AI agents can update the stamp collection. + +The webpage today is designed with a visual UX in mind. It uses simple JavaScript with a `'submit'` event handler that reads the form fields, adds the new record, and refreshes the UI: + +```js +document.getElementById('addStampForm').addEventListener('submit', (event) => { + event.preventDefault(); + + const stampName = document.getElementById('stampName').value; + const stampDescription = document.getElementById('stampDescription').value; + const stampYear = document.getElementById('stampYear').value; + const stampImageUrl = document.getElementById('stampImageUrl').value; + + addStamp(stampName, stampDescription, stampYear, stampImageUrl); +}); +``` + +To facilitate code reuse, the developer has already extracted the code to add a stamp and refresh the UI into a helper function `addStamp()`: + +```js +function addStamp(stampName, stampDescription, stampYear, stampImageUrl) { + // Add the new stamp to the collection + stamps.push({ + name: stampName, + description: stampDescription, + year: stampYear, + imageUrl: stampImageUrl || null + }); + + // Confirm addition and update the collection + document.getElementById('confirmationMessage').textContent = `Stamp "${stampName}" added successfully!`; + renderStamps(); +} +``` + +To let AI agents use this functionality, the author defines the available tools. The `agent` property on the `Window` is checked to ensure the browser supports WebMCP. If supported, the `provideContext()` method is called, passing in an array of tools with a single item, a definition for the new "Add Stamp" tool. The tool accepts as parameters the same set of fields that are present in the HTML form, since this tool and the form should be functionally equivalent. + +```js +if ("agent" in window) { + window.agent.provideContext({ + tools: [ + { + name: "add-stamp", + description: "Add a new stamp to the collection", + inputSchema: { + type: "object", + properties: { + name: { type: "string", description: "The name of the stamp" }, + description: { type: "string", description: "A brief description of the stamp" }, + year: { type: "number", description: "The year the stamp was issued" }, + imageUrl: { type: "string", description: "An optional image URL for the stamp" } + }, + required: ["name", "description", "year"] + }, + async execute({ name, description, year, imageUrl }) { + // TODO + } + } + ] + }); +} +``` + +Now the author needs to implement the tool. The tool needs to update the stamp database, and refresh the UI to reflect the change to the database. Since the code to do this is already available in the `addStamp()` function written earlier, the tool implementation is very simple and just needs to call this helper when an "add-stamp" tool call is received. After calling the helper, the tool needs to signal completion and should also provide some sort of feedback to the client application that requested the tool call. It returns a text message indicating the stamp was added: + +```js +async execute({ name, description, year, imageUrl }) { + addStamp(name, description, year, imageUrl); + + return { + content: [ + { + type: "text", + text: `Stamp "${name}" added successfully! The collection now contains ${stamps.length} stamps.`, + }, + ] + }; +} +``` +### Future improvements to this example + +#### Use a worker + +To improve the user experience and make it possible for the stamp application to handle a large number of tool calls without tying up the document's main thread, the web developer may choose to move the tool handling into a dedicated worker script. Handling tool calls in a worker keeps the UI responsive, and makes it possible to handle potentially long-running operations. For example, if the user asks an AI agent to add a list of hundreds of stamps from an external source such as a spreadsheet, this will result in hundreds of tool calls. + +#### Adaptive UI + +The author may also wish to change the on-page user experience when a client is connected. For example, if the user is interacting with the page primarily through an AI agent or assistive tool, then the author might choose to disable or hide the HTML form input and use more of the available space to show the stamp collection. + +## Other API Alternatives considered + +### Web App Manifest, other manifest-based or declarative approaches + +We considered declaring tools statically in a site's Web App Manifest. Declaring tools solely in the Web App Manifest limits WebMCP to PWAs which could impact adoption since users would need to install a site as an app for tools to be available. + +Another type of manifest could be proposed but using this approach also means that only a fixed set of static tools are available and can't be updated dynamically based on application state, which seems like an important ability for web developers. Since manifests can't execute code, it also means manifests are additional work for the developer since they will need to still implement the tool somewhere. + +Our recommended approach above allows for the possibility of declarative tools in the future while giving web developers as much control as possible by defining tools in script. + +### Handling tool calls in worker threads + +Handling tool calls on the main thread raises performance concerns, especially if an agent requests a large amount of tool calls in sequence, and/or the tools are computationally expensive. A design alternative that required tool calls to be handled in workers was considered instead. + +One proposal was to expose the WebMCP API only in service workers and let the service worker post messages to individual client windows/tabs as needed in order to update UI. This would have complicated the architecture and required web developers to add a service worker. This would also have required the Session concept described earlier to help the service worker differentiate between agents that are connected to different windows and dispatch requests from a particular agent to the correct window. + +For long-running, batched, or expensive tool calls, we expect web developers will dynamically update their UI when these are taking place to temporarily cede control to the agent (e.g. disable or remove human form inputs, indicate via UI that an agent is in control), and take advantage of dedicated workers as needed to offload expensive operations. This can be achieved with existing dedicated or shared workers. \ No newline at end of file diff --git a/docs/webmcp.md b/docs/webmcp.md new file mode 100644 index 0000000..d33d75d --- /dev/null +++ b/docs/webmcp.md @@ -0,0 +1,609 @@ +# WebMCP + +_Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows._ + +## TL;DR + +We propose a new JavaScript interface that allows web developers to expose their web application functionality as "tools" - JavaScript functions with natural language descriptions and structured schemas that can be invoked by AI agents, browser assistants, and assistive technologies. Web pages that use WebMCP can be thought of as [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control. + +For the technical details of the proposal, code examples, API shape, etc. see [proposal.md](proposal.md). + +## Terminology Used + +###### Agent +An autonomous assistant that can understand a user's goals and take actions on the user's behalf to achieve them. Today, +these are typically implemented by large language model (LLM) based AI platforms, interacting with users via text-based +chat interfaces. + +###### Browser's Agent +An autonomous assistant as described above but provided by or through the browser. This could be an agent built directly +into the browser or hosted by it, for example, via an extension or plug-in. + +###### AI Platform +Providers of agentic assistants such as OpenAI's ChatGPT, Anthropic's Claude, or Google's Gemini. + +###### Backend Integration +A form of API integration between an AI platform and a third-party service in which the AI platform can talk directly to +the service's backend servers without a UI or running code in the client. For example, the AI platform communicating with +an MCP server provided by the service. + +###### Actuation +An agent interacting with a web page by simulating user input such as clicking, scrolling, typing, etc. + +## Background and Motivation + +The web platform's ubiquity and popularity have made it the world's gateway to information and capabilities. Its ability to support complex, interactive applications beyond static content, has empowered developers to build rich user experiences and applications. These user experiences rely on visual layouts, mouse and touch interactions, and visual cues to communicate functionality and state. + +As AI agents become more prevalent, the potential for even greater user value is within reach. AI platforms such as Copilot, ChatGPT, Claude, and Gemini are increasingly able to interact with external services to perform actions such as checking local weather, finding flight and hotel information, and providing driving directions. These functions are provided by external services that extend the AI model’s capabilities. These extensions, or “tools”, can be used by an AI to provide domain-specific functionality that the AI cannot achieve on its own. Existing tools integrate with each AI platform via bespoke “integrations” - each service registers itself with the chosen platform(s) and the platform communicates with the service via an API (MCP, OpenAPI, etc). In this document, we call this style of tool a “backend integration”; users make use of the tools/services by chatting with an AI, the AI platform communicates with the service on the user's behalf. + +Much of the challenges faced by assistive technologies also apply to AI agents that struggle to navigate existing human-first interfaces when agent-first "tools" are not available. Even when agents succeed, simple operations often require multiple steps and can be slow or unreliable. + +The web needs web developer involvement to thrive. What if web developers could easily provide their site's capabilities to the agentic web to engage with their users? We propose WebMCP, a JavaScript API that allows developers to define tools for their webpage. These tools allow for code reuse with frontend code, maintain a single interface for users and agents, and simplify auth and state where users and agents are interacting in the same user interface. Such an API would also be a boon for accessibility tools, enabling them to offer users higher-level actions to perform on a page. This would mark a significant step forward in making the web more inclusive and actionable for everyone. + +AI agents can integrate in the backend via protocols like MCP in order to fulfill a user's task. For a web developer to expose their site's functionality this way, they need to write a server, usually in Python or NodeJS, instead of frontend JS which may be more familiar. + +There are several advantages to using the web to connect agents to services: + +* **Businesses near-universally already offer their services via the web.** + + WebMCP allows them to leverage their existing business logic and UI, providing a quick, simple, and incremental + way to integrate with agents. They don't have to re-architect their product to fit the API shape of a given agent. + This is especially true when the logic is already heavily client-side. + + +* **Enables visually rich, cooperative interplay between a user, web page, and agent with shared context.** + + Users often start with a vague goal which is refined over time. Consider a user browsing for a high-value purchase. + The user may prefer to start their journey on a specific page, ask their agent to perform some of the more tedious + actions ("find me some options for a dress that's appropriate for a summer wedding, preferably red or orange, short + or no sleeves and no embelishments"), and then take back over to browse among the agent-selected options. + +* **Allows authors to serve humans and agents from one source** + + The human-use web is not going away. Integrating agents into it prevents fragmentation of their service and allows + them to keep ownership of their interface, branding and connection with their users. + +WebMCP is a proposal for a web API that enables web pages to provide agent-specific paths in their UI. With WebMCP, agent-service interaction takes place _via app-controlled UI_, providing a shared context available to app, agent, and user. In contrast to backend integrations, WebMCP tools are available to an agent only once it has loaded a page and they execute on the client. Page content and actuation remain available to the agent (and the user) but the agent also has access to tools which it can use to achieve its goal more directly. + +![A diagram showing an agent communicating with a third-party service via script tools running in a live web page](../content/explainer_st.svg) + +In contrast, in a backend integration, the agent-service interaction takes place directly, without an associated UI. If +a UI is required it must be provided by the agent itself or somehow connected to an existing UI manually: + +The expected flow using browser agents and Script Tools: + +![A diagram showing an agent communicating with a third-party service directl via MCP](../content/explainer_mcp.svg) + +## Goals + +- **Enable human-in-the-loop workflows**: Support cooperative scenarios where users work directly through delegating tasks to AI agents or assistive technologies while maintaining visibility and control over the web page(s). +- **Simplify AI agent integration**: Enable AI agents to be more reliable and helpful by interacting with web sites through well-defined JavaScript tools instead of through UI actuation. +- **Minimize developer burden**: Any task that a user can accomplish through a page's UI can be made into a tool by re-using much of the page's existing JavaScript code. +- **Improve accessibility**: Provide a standardized way for assistive technologies to access web application functionality beyond what's available through traditional accessibility trees which are not widely implemented. + +## Non-Goals + +- **Headless browsing scenarios**: While it may be possible to use this API for headless or server-to-server interactions where no human is present to observe progress, this is not a current goal. Headless scenarios create many questions like the launching of browsers and profile considerations. +- **Autonomous agent workflows**: The API is not intended for fully autonomous agents operating without human oversight, or where a browser UI is not required. This task is likely better suited to existing protocols like [A2A](https://a2aproject.github.io/A2A/latest/). +- **Replacement of backend integrations**: WebMCP works with existing protocols like MCP and is not a replacement of existing protocols. +- **Replace human interfaces**: The human web interface remains primary; agent tools augment rather than replace user interaction. +- **Enable / influence discoverability of sites to agents** + +## Use Cases + +The use cases for script tools are ones in which the user is collaborating with the agent, rather than completely +delegating their goal to it. They can also be helpful where interfaces are highly specific or complicated. + +### Example - Creative + +_Jen wants to create an invitation to her upcoming yard sale so she uses her browser to navigate to +`http://easely.example`, her favorite graphic design platform. However, she's rather new to it and sometimes struggles +to find all the functionality needed for her task in the app's extensive menus. She creates a "yard sale flyer" design +and opens up a "templates" panel to look for a premade design she likes. There's so many templates and she's not sure +which to choose from so she asks her browser agent for help._ + +**Jen**: Show me templates that are spring themed and that prominently feature the date and time. They should be on a +white background so I don't have to print in color. + +_The current document has registered a script tool that the agent notices may be relevant to this query:_ + +```js +/** + * Filters the list of templates based on a description. + * + * description - A visual description of the types of templates to show, in natural language (English). + */ + filterTemplates(description) +``` + +_The agent invokes the tool: `filterTemplate("sprint themed, date and time displayed prominently, white background")`. +The UI updates to show a filtered list matching this description._ + +**Agent**: Ok, the remaining templates should now match your description. + +_Jen picks a template and gets to work._ + +_The agent notices a new tool was registered when the design was loaded:_ + +```js +/** + * Makes changes to the current design based on instructions. Possible actions include modifications to text + * and font; insertion, deletion, transformation of images; placement and scale of elements. The instructions + * should be limited a single task. Here are some examples: + + * editDesign("Change the title's font color to red"); + * editDesign("Rotate each picture in the background a bit to give the design a less symmetrical feel"); + * editDesign("Add a text field at the bottom of the design that reads 'example text'"); + * + * instructions - A description of how the design should be changed, in natural language (English). + */ + editDesign(instructions) +``` + +_With all the context of Jen's prompts, page state, and this editDesign tool, the agent is able to make helpful +suggestions on next steps:_ + +**Agent**: Would you like me to make the time/date font larger? + +**Jen**: Sure. Could you also swap out the clipart for something more yard-sale themed? + +**Agent**: Sure, let me do that for you. + +**Jen**: Please fill in the time and place using my home address. The time should be in my e-mail in a message from my +husband. + +**Agent**: Ok, I've found it - I'll fill in the flyer with Aug 5-8, 2025 from 10am-3pm | 123 Queen Street West. + +_Jen is almost happy with the current design but think the heading could be better_ + +**Jen**: Help me come up with a more attention grabbing headline for the call to action and title. + +**Agent**: Of course! Here are some more attention-grabbing headlines for your yard sale flyer, broken down by title and +call to action: + +To Create Excitement: + * Yard Sale Extravaganza! + * The Ultimate Clear-Out Sale + * Mega Garage & Yard Sale + +... + +**Jen**: Lets use "Yard Sale Extravaganza!" as the title. Create copies of this page with each of the call to action +suggestions. + +_The agent takes this action using a sequence of tool calls which might look something like:_ + +* `EditDesign("Change the title text to 'Yard Sale Extravaganza!'")` +* `EditDesign("Change the call-to-action text to 'The hunt is on!'")` +* `AddPage("DUPLICATE")` +* `EditDesign("Change the call-to-action text to 'Ready, set, shop!'")` +* `AddPage("DUPLICATE")` +* `EditDesign("Change the call-to-action text to 'Come for the bargains, stay for the cookies'")` + +_Jen now has 3 versions of the same yard sale flyer. Easely implements these script tools using AI-based techinques on +their backend to allow a natural language interface. Additionally, the UI presents these changes to Jen as an easily +reversible batch of "uncommitted" changes, allowing her to easily review the agent's actions and make changes or undo as +necessary. While the site could also implement a chat interface to expose this functionality with their own agent, the +browser's agent provides a seamless journey by using tools across multiple sites/services. For example, pulling up +information from the user's email service._ + +**Agent**: Done! I've created three variations of the original design, each with a unique call to action. + +_Jen is now happy with these flyers. Normally she'd print to PDF and then take the file to a print shop. However, Easely +has a new print service that Jen doesn't know about and doesn't notice in the UI. However, the agent knows the page has +an `orderPrints` tool: + +```js +/** + * Orders the current design for printing and shiping to the user. + * + * copies - A number between 0 and 1000 indicating how many copies of the design to print. Required. + * page_size - The paper type to use. Available options are [Legal, Letter, A4, A5]. Default is "Letter". + * page_finish - What kind of paper finish to use. Available options are [Regular, Glosys Photo, Matte Photo]. + * Default is "Regular" + */ +orderPrints(copies, page_size, page_finish); +``` + +_The agent understands the user's intent and so surfaces a small chip in it's UI:_ + +**Agent**: `` + +_Jen is delighted she saved a trip to the store and clicks the button_. + +**Agent**: How many copies would you like? I'll request 8.5x11 sized regular paper but there are other options available. + +**Jen**: Please print 10 copies. + +**Agent**: Done! The order is ready for your review. + +_The site navigates to the checkout page where Jen can review the order and click the checkout button to confirm._ + + +### Example - Shopping + +Shopping often includes exploration, browsing, and reviewing. The agent can help and remove the tedious parts but the +user wants to be involved. + +_Maya has a few favorite clothing vendors but doesn't get formal clothes very often so she opens her AI assistant_ + +**Maya**: Suggest a few formal or semi-formal dress places where I can shop for my friend’s wedding. Please make sure +they are ecofriendly and not too expensive. + +**Agent**: Sure, here's a few suggestions + +1. Elara - uses innovative recycled and plant-based fabrics to minimize waste for a modern, chic wardrobe that allows + you to embrace the latest trends, responsibly. Price Range: Dresses typically \$90 - \$220 CAD + +2. Linden Harbour - creates wonderfully crafted womenswear designed to last a lifetime. Carries everything from classic +shirt dresses to elegant knitwear, meticulously made from the finest organic cotton, linen, and responsibly sourced +wool. Price Range: Dresses typically \$250 - \$550 CAD + +3. Wildebloom - Flowing dresses brought to life through artisanal collaborations, using natural plant dyes and panels of + upcycled vintage textiles. Price Range: Dresses typically \$180 - \$450 CAD + +**Maya**: Lets take a look at Wildebloom. + +_The agent now opens Maya's web browser, which it is integrated with. i.e. the agent can observe and control the +browser. It navigates to the dresses page on `http://wildebloom.example/shop`_ + +**Agent**: Ok, here are the dresses that Wildebloom carries. + +_Maya is immediately overwhelmed. There are so many options! Moreover, when she looks at filters she sees they're +quite limited with only colour and size as options._ + +**Maya**: Show me only dresses available in my size, and also show only the ones that would be appropriate for a +cocktail-attire wedding. + +_The agent notices the dresses page registers several tools:_ + +```js +/* + * Returns an array of product listings containing an id, detailed description, price, and photo of each + * product + * + * size - optional - a number between 2 and 14 to filter the results by EU dress size + * size - optional - a color from [Red, Blue, Green, Yellow, Black, White] to filter dresses by +getDresses(size, color) + +/* + * Displays the given products to the user + * + * product_ids - An array of numbers each of which is a product id returned from getDresses + */ +showDresses(product_ids) +``` + +_The agent calls `getDresses(6)` and receives a JSON object:_ + +```json +{ + products: [ + { + id: 1021, + description: "A short sleeve long dress with full length button placket...", + price: "€180", + image: "img_1024.png", + }, + { + id: 4320, + description: "A straight midi dress in organic cotton...", + price: "€140", + image: "img_4320.png", + }, + ... + ] +} +``` + +> [!Note] +> How to pass images and other non-textual data is something we should improve +> Issue #10 + +_The agent can now process this list, fetching each image, and using the user's criteria to filter the list. When +completed it makes another call, this time to `showDresses([4320, 8492, 5532, ...])`. This call updates the UI on the +page to show only the requested dresses._ + +_This is still too many dresses so Maya finds an old photo of herself in a summer dress that she really likes and shares +it with her agent._ + +**Maya**: Are there any dresses similar to the dress worn in this photo? Try to match the colour and style, but continue +to show me dresses appropriate for cocktail-attire. + +_The agent uses this image to identify several new parameters including: the colour, the fit, and the neckline and +narrows down the list to just a few dresses. Maya finds and clicks on a dress she likes._ + +_Notice, the user did not give their size, but the agent knows this from personalization and may even translate the stored +size into EU units to use it with this site._ + +### Example - Code Review + +Some services are very domain specific and/or provide a lot of functionality. A real world example is the Chromium code +review tool: Gerrit. See [CL#5142508](crrev.com/c/5142508). Gerrit has many features but they're not obvious just by +looking at the UI (you can press the '?' key to show a shortcut guide). In order to add a comment to a line, the user +must know to press the 'c' key. The user can suggest edits but has to open a comment to do so. Results from test runs +are available but are hidden in a generically-named "Checks" tab. + +Agents are typically trained on everyday usage so may do a poor job on more specialized, complex interfaces. However, +such sites could provide the agent with tools which serve as both a shortcut and a user manual for the agent. + +_John is a software developer and opens a code review sent from his colleague. He notices there's two red bots +indicating test failures on this patch._ + +**John**: Why are the Mac and Android bots failing? + +_The site includes some relevant tools:_ + +```js +/** + * Returns the status of each bot run in a try run job. Use the returned bot_name in getTryRunFailureSnippet + * to get more details about a run. + */ +getTryRunStatuses(); + +/** + * If a bot failed, returns the TAIL snippet of the log containing the error message describing the failure. + * + * bot_name - The name of the bot, as returned from getTryRunStatuses, to get failure snippet from + */ +getTryRunFaliureSnippet(bot_name) +``` + +_The agent calls `getTryRunStatuses()` to find the names of the failing bots. It returns:_ + +```json +[ + { + "bot_name": "mac-x86-rel", + "status": "PASSED" + }, + { + "bot_name": "mac-x64-rel", + "status": "FAIL" + }, + { + "bot_name": "android-15-rel", + "status": "PASSED" + }, + ... +] +``` + +**Agent**: I see that the Mac x86 Android 10 bots are failing. Let me get more details... + +The agent now calls `getTryRunFailureSnippet` on each of the failing bots and receives a snippet of the failing log of +each._ + +**Agent**: + +The Mac bot is failing to start with an "Out of Space" error. This is probaby just an ifrastructure issue. + +The Android bot is failing while linking with a missing symbol `gfx::DisplayCompositor`. + +_John realizes his colleague forgot to add the Android implementation of this class to the build file._ + +**John**: I think BUILD.gn is missing `display_compositor_android.cc`, please add a suggested edit adding it in the +Android section. + +_The agent has access to BUILD.gn from the main UI as well as in DOM. It notices it also has access to a relevant tool +which is uses to add the change:_ + +```js +/* + * Adds a suggested edit to the review + * + * filename - the name of the file in which to make the edit + * patch - the edit diff in unidiff format + */ +addSuggestedEdit(filename, patch) +``` + +**Agent**: Ok, I've added `display_compositor_android.cc` to the BUILD.gn file. + +_The UI displays the suggested diff with an option for the user to accept, modify, or reject the change. John accepts +the change._ + +_Reading the rest of the review, John notices a small issue repeated across multiple files._ + +**John**: Add a polite comment to the review that we should use "PointF" rather than "Point" for input coordinates since +the latter can cause unintended rounding. Then add suggested edits changing all instances where Point was added to +PointF. + +_The agent automates the repetitive task of making all the simple changes. The UI provides John with a visual way to +quickly review the agent's actions and accept/modify/reject them._ + +## Assumptions + +* For many sites wanting to integrate with agents quickly - augmenting their existing UI with script tools will be + easier vs. backend integration +* Agents will perform quicker and more successfully with specific tools compared to using a human interface. +* Users might use an agent for a direct action query (e.g. “create a 30 minute meeting with Pat at 3:00pm”), complex + cross-site queries (e.g. “Find the 5 highest rated restaurants in Toronto, pin them in my Map, and book a table at + each one over the next 5 weeks”) and everything in between. + +## Prior Art + +### Model Context Protocol (MCP) + +MCP is a protocol for applications to interface with an AI model. Developed by Anthropic, MCP is supported by Claude +Desktop and Open AI's Agents SDK as well as a growing ecosystem of clients and servers. + +In MCP, an application can expose tools, resources, and more to an AI-enabled application by implementing an MCP server. +The server can be implemented in various languages, as long as it conforms to the protocol. For example, here’s an +implementation of a tool using the Python SDK from the MCP quickstart guide: + +```python +@mcp.tool() +async def get_alerts(state: str) -> str: + """Get weather alerts for a US state. + + Args: + state: Two-letter US state code (e.g. CA, NY) + """ + url = f"{NWS_API_BASE}/alerts/active/area/{state}" + data = await make_nws_request(url) + + if not data or "features" not in data: + return "Unable to fetch alerts or no alerts found." + + if not data["features"]: + return "No active alerts for this state." + + alerts = [format_alert(feature) for feature in data["features"]] + return "\n---\n".join(alerts) +``` + +A client application implements a matching MCP client which takes a user’s query, communicates with one or more MCP +servers to enumerate their capabilities, and constructs a prompt to the AI platform, passing along any server-provided +tools or data. + +The MCP protocol defines how this client-server communication happens. For example, a client can ask the server to list +all tools which might return a response like this: + +```json +{ + "jsonrpc": "2.0", + "id": 1, + "result": { + "tools": [ + { + "name": "get_weather", + "description": "Get current weather information for a location", + "inputSchema": { + "type": "object", + "properties": { + "location": { + "type": "string", + "description": "City name or zip code" + } + }, + "required": ["location"] + } + } + ], + "nextCursor": "next-page-cursor" + } +} +``` + +Unlike OpenAPI, MCP is transport-agnostic. It comes with two built in transports: stdio which uses the systems standard +input/output, well suited for local communication between apps, and Server-Sent Events (SSE) which uses HTTP commands +for remote execution. + +### WebMCP (MCP-B) + +[MCP-B](https://mcp-b.ai/), or Model Context Protocol for the Browser, is an open source project found on GitHub [here](https://github.com/MiguelsPizza/WebMCP) and has much the same motivation and solution as described in this proposal. MCP-B's underlying protocol, also named WebMCP, extends MCP with tab transports that allow in-page communicate between a website's MCP server and any client in the same tab. It also extends MCP with extension transports that use Chromium's runtime messaging to make a website's MCP server available to other extension components within the browser (background, sidebar, popup), and to other external MCP clients running on the same machine. MCP-B enables tools from different sites to work together, and for sites to cache tools so that they are discoverable even if the browser isn't currently navigated to the site. + +### OpenAPI + +OpenAPI is a standard for describing HTTP based APIs. Here’s an example in YAML (from the ChatGPT Actions guide): + +```yaml +openapi: 3.1.0 +info: + title: NWS Weather API + description: Access to weather data including forecasts, alerts, and observations. + version: 1.0.0 +servers: + - url: https://api.weather.gov + description: Main API Server +paths: + /points/{latitude},{longitude}: + get: + operationId: getPointData + summary: Get forecast grid endpoints for a specific location + parameters: + - name: latitude + in: path + required: true + schema: + type: number + format: float + description: Latitude of the point + - name: longitude + in: path + required: true + schema: + type: number + format: float + description: Longitude of the point + responses: + '200': + description: Successfully retrieved grid endpoints + content: + application/json: + schema: + type: object + properties: + properties: + type: object + properties: + forecast: + type: string + format: uri + forecastHourly: + type: string + format: uri + forecastGridData: + type: string + format: uri +``` + +A subset of the OpenAPI specification is used for function-calling / tool use for various AI platforms, such as ChatGPT +Actions and Gemini Function Calling. A user or developer on the AI platform would provide the platform with the OpenAPI +schema for an API they wish to provide as a “tool”. The AI is trained to understand this schema and is able to select +the tool and output a “call” to it, providing the correct arguments. Typically, some code external to the AI itself +would be responsible for making the API call and passing the returned result back to the AI’s conversation context to +reply to the user’s query. + +### Agent2Agent Protocol + +The Agent2Agent Protocol is another protocol for communication between agents. While similar in structure to MCP (client +/ server concepts that communicate via JSON-RPC), A2A attempts to solve a different problem. MCP (and OpenAPI) are +generally about exposing traditional capabilities to AI models (i.e. “tools”), A2A is a protocol for connecting AI +agents to each other. It provides some additional features to make common tasks in this scenario more streamlined, such +as: capability advertisement, long running and multi-turn interactions, and multimodal input/output. + +## Open topics + +### Security considerations + +There are security considerations that will need to be accounted for, especially if the WebMCP API is used by semi-autonomous systems like LLM-based agents. Engagement from the community is welcome. + +### Model poisoning + +Explorations should be made on the potential implications of allowing web developers to create tools in their front-end code for use in AI agents and LLMs. For example, vulnerabilities like being able to access content the user would not typically be able to see will need to be investigated. + +### Cross-Origin Isolation + +Client applications would have access to many different web sites that expose tools. Consider an LLM-based agent. It is possible and even likely that data output from one application's tools could find its way into the input parameters for a second application's tool. There are legitimate reasons for the user to want to send data across origins to achieve complex tasks. Care should be taken to indicate to the user which web applications are being invoked and with what data so that the user can intervene. + +### Permissions + +A trust boundary is crossed both when a web site first registers tools via WebMCP, and when a new client agent wants to use these tools. When a web site registers tools, it exposes information about itself and the services it provides to the host environment (i.e. the browser). When agents send tool calls, the site receives untrusted input in the parameters and the outputs in turn may contain sensitive user information. The browser should prompt the user at both points to grant permission and also provide a means to see what information is being sent to and from the site when a tool is called. To streamline workflows, browsers may give users the choice to always allow tool calls for a specific web app and client app pair. + +### Model Context Protocol (MCP) without WebMCP + +MCP has quickly garnered wide interest from the developer community, with hundreds of MCP servers being created. WebMCP is designed to work well with MCP, so that developers can reuse many of the MCP topics with their front-end website using JavaScript. We originally planned to propose an explainer very tightly aligned with MCP, providing all the same concepts supported by MCP at the time of writing, including tools, resources, and prompts. Since MCP is still actively changing, matching its exact capabilities would be an ongoing effort. Aligning the WebMCP API tightly with MCP would also make it more difficult to tailor WebMCP for non-LLM scenarios like OS and accessibility assistant integrations. Keeping the WebMCP API as agnostic as possible increases the chance of it being useful to a broader range of potential clients. + +We expect some web developers will continue to prefer standalone MCP instead of WebMCP if they want to have an always-on MCP server running that does not require page navigation in a full browser process. For example, server-to-server scenarios such as fully autonomous agents will likely benefit more from MCP servers. WebMCP is best suited for local browser workflows with a human in the loop. + +The WebMCP API still maps nicely to MCP, and exposing WebMCP tools to external applications via an MCP server is still a useful scenario that a browser implementation may wish to enable. + +### Existing web automation techniques (DOM, accessibility tree) + +One of the scenarios we want to enable is making the web more accessible to general-purpose AI-based agents. In the absence of alternatives like MCP servers to accomplish their goals, these general-purpose agents often rely on observing the browser state through a combination of screenshots, and DOM and accessibility tree snapshots, and then interact with the page by simulating human user input. We believe that WebMCP will give these tools an alternative means to interact with the web that give the web developer more control over whether and how an AI-based agent interacts with their site. + +The proposed API will not conflict with these existing automation techniques. If an agent or assistive tool finds that the task it is trying to accomplish is not achievable through the WebMCP tools that the page provides, then it can fall back to general-purpose browser automation to try and accomplish its task. + +## Future explorations + +### Progressive web apps (PWA) + +PWAs should also be able to use the WebMCP API as described in this proposal. There are potential advantages to installing a site as a PWA. In the current proposal, tools are only discoverable once a page has been navigated to and only persist for the lifetime of the page. A PWA with an app manifest could declare tools that are available "offline", that is, even when the PWA is not currently running. The host system would then be able to launch the PWA and navigate to the appropriate page when a tool call is requested. + +### Background model context providers + +Some tools that a web app may want to provide for agents and assistive technologies may not require any web UI. For example, a web developer building a "To Do" application may want to expose a tool that adds an item to the user's todo list without showing a browser window. The web developer may be content to just show a notification that the todo item was added. + +For scenarios like this, it may be helpful to combine tool call handling with something like the ['launch'](https://github.com/WICG/web-app-launch/blob/main/sw_launch_event.md) event. A client application might attach a tool call to a "launch" request which is handled entirely in a service worker without spawning a browser window. \ No newline at end of file From 6748764c34ac54f7e790696b12114dfde74f12c5 Mon Sep 17 00:00:00 2001 From: Brandon Walderman Date: Thu, 7 Aug 2025 15:06:39 -0700 Subject: [PATCH 2/4] Rename --- docs/explainer.md | 710 +++++++++++++++++++++++++++++++--------------- docs/webmcp.md | 609 --------------------------------------- 2 files changed, 487 insertions(+), 832 deletions(-) delete mode 100644 docs/webmcp.md diff --git a/docs/explainer.md b/docs/explainer.md index 48e0c5a..d33d75d 100644 --- a/docs/explainer.md +++ b/docs/explainer.md @@ -1,345 +1,609 @@ -# Web Model Context API +# WebMCP -_Enabling web apps to provide context and tools that can be accessed from other apps to create complex workflows._ +_Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows._ -## TL:DR; +## TL;DR -We propose a new JavaScript interface available on web pages that allows web developers to define "tools"; JS functions annotated with natural language descriptions and a schema describing their usage and input parameters that can be leveraged by other apps. In a web browser, webpages implementing Web Model Context can easily be treated as [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) servers, giving web developers the ability to provide functionality from their site to AI agents with improved auth and state when a webpage is visible. These tools can also be leveraged by other technologies as well like assistive ones. +We propose a new JavaScript interface that allows web developers to expose their web application functionality as "tools" - JavaScript functions with natural language descriptions and structured schemas that can be invoked by AI agents, browser assistants, and assistive technologies. Web pages that use WebMCP can be thought of as [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control. -## The problem +For the technical details of the proposal, code examples, API shape, etc. see [proposal.md](proposal.md). + +## Terminology Used + +###### Agent +An autonomous assistant that can understand a user's goals and take actions on the user's behalf to achieve them. Today, +these are typically implemented by large language model (LLM) based AI platforms, interacting with users via text-based +chat interfaces. + +###### Browser's Agent +An autonomous assistant as described above but provided by or through the browser. This could be an agent built directly +into the browser or hosted by it, for example, via an extension or plug-in. + +###### AI Platform +Providers of agentic assistants such as OpenAI's ChatGPT, Anthropic's Claude, or Google's Gemini. + +###### Backend Integration +A form of API integration between an AI platform and a third-party service in which the AI platform can talk directly to +the service's backend servers without a UI or running code in the client. For example, the AI platform communicating with +an MCP server provided by the service. + +###### Actuation +An agent interacting with a web page by simulating user input such as clicking, scrolling, typing, etc. + +## Background and Motivation The web platform's ubiquity and popularity have made it the world's gateway to information and capabilities. Its ability to support complex, interactive applications beyond static content, has empowered developers to build rich user experiences and applications. These user experiences rely on visual layouts, mouse and touch interactions, and visual cues to communicate functionality and state. -As AI agents become more prevalent, the potential for even greater user value is within reach. Yet, much of the challenges faced by assistive technologies also apply to AI agents that struggle to navigate these existing human-first interfaces. Even when agents succeed, simple operations often require multiple steps and can be slow or unreliable. +As AI agents become more prevalent, the potential for even greater user value is within reach. AI platforms such as Copilot, ChatGPT, Claude, and Gemini are increasingly able to interact with external services to perform actions such as checking local weather, finding flight and hotel information, and providing driving directions. These functions are provided by external services that extend the AI model’s capabilities. These extensions, or “tools”, can be used by an AI to provide domain-specific functionality that the AI cannot achieve on its own. Existing tools integrate with each AI platform via bespoke “integrations” - each service registers itself with the chosen platform(s) and the platform communicates with the service via an API (MCP, OpenAPI, etc). In this document, we call this style of tool a “backend integration”; users make use of the tools/services by chatting with an AI, the AI platform communicates with the service on the user's behalf. + +Much of the challenges faced by assistive technologies also apply to AI agents that struggle to navigate existing human-first interfaces when agent-first "tools" are not available. Even when agents succeed, simple operations often require multiple steps and can be slow or unreliable. + +The web needs web developer involvement to thrive. What if web developers could easily provide their site's capabilities to the agentic web to engage with their users? We propose WebMCP, a JavaScript API that allows developers to define tools for their webpage. These tools allow for code reuse with frontend code, maintain a single interface for users and agents, and simplify auth and state where users and agents are interacting in the same user interface. Such an API would also be a boon for accessibility tools, enabling them to offer users higher-level actions to perform on a page. This would mark a significant step forward in making the web more inclusive and actionable for everyone. + +AI agents can integrate in the backend via protocols like MCP in order to fulfill a user's task. For a web developer to expose their site's functionality this way, they need to write a server, usually in Python or NodeJS, instead of frontend JS which may be more familiar. + +There are several advantages to using the web to connect agents to services: + +* **Businesses near-universally already offer their services via the web.** + + WebMCP allows them to leverage their existing business logic and UI, providing a quick, simple, and incremental + way to integrate with agents. They don't have to re-architect their product to fit the API shape of a given agent. + This is especially true when the logic is already heavily client-side. + -The web needs web developer involvement to thrive. What if web developers could easily provide their sites capabilities to the agentic web to engage with their users? Model Context Protocol (MCP) is one popular approach, yet the protocol requires backend Python or NodeJS, requires a server, and makes human-in-the-loop flows requiring auth and state management tricky. +* **Enables visually rich, cooperative interplay between a user, web page, and agent with shared context.** -We propose Web Model Context, a JavaScript API that allows developers to define tools for their webpage. These tools allow for code reuse with front end code, maintain a single interface for users and agents, and simplify auth and state where users and agents are interacting in the same user interface. + Users often start with a vague goal which is refined over time. Consider a user browsing for a high-value purchase. + The user may prefer to start their journey on a specific page, ask their agent to perform some of the more tedious + actions ("find me some options for a dress that's appropriate for a summer wedding, preferably red or orange, short + or no sleeves and no embelishments"), and then take back over to browse among the agent-selected options. -Such an API would also be a boon for accessibility tools, enabling them to offer users higher-level actions to perform on a page. This would mark a significant step forward in making the web more inclusive and actionable for everyone. +* **Allows authors to serve humans and agents from one source** -## Goals of the Web Model Context API + The human-use web is not going away. Integrating agents into it prevents fragmentation of their service and allows + them to keep ownership of their interface, branding and connection with their users. -Primary goals of the Web Model Context API: +WebMCP is a proposal for a web API that enables web pages to provide agent-specific paths in their UI. With WebMCP, agent-service interaction takes place _via app-controlled UI_, providing a shared context available to app, agent, and user. In contrast to backend integrations, WebMCP tools are available to an agent only once it has loaded a page and they execute on the client. Page content and actuation remain available to the agent (and the user) but the agent also has access to tools which it can use to achieve its goal more directly. -- **Enable human-in-the-loop workflows**: Support scenarios where users work directly through delegating tasks to AI agents or assistive technologies while maintaining much of the visibility and control over the web page(s). -- **Simplify AI agent integration**: Allow AI agents to interact with web sites through well-defined tools defined in JavaScript rather than screen parsing, automation, or writing back-end code. +![A diagram showing an agent communicating with a third-party service via script tools running in a live web page](../content/explainer_st.svg) + +In contrast, in a backend integration, the agent-service interaction takes place directly, without an associated UI. If +a UI is required it must be provided by the agent itself or somehow connected to an existing UI manually: + +The expected flow using browser agents and Script Tools: + +![A diagram showing an agent communicating with a third-party service directl via MCP](../content/explainer_mcp.svg) + +## Goals + +- **Enable human-in-the-loop workflows**: Support cooperative scenarios where users work directly through delegating tasks to AI agents or assistive technologies while maintaining visibility and control over the web page(s). +- **Simplify AI agent integration**: Enable AI agents to be more reliable and helpful by interacting with web sites through well-defined JavaScript tools instead of through UI actuation. - **Minimize developer burden**: Any task that a user can accomplish through a page's UI can be made into a tool by re-using much of the page's existing JavaScript code. - **Improve accessibility**: Provide a standardized way for assistive technologies to access web application functionality beyond what's available through traditional accessibility trees which are not widely implemented. -Non-goals: +## Non-Goals - **Headless browsing scenarios**: While it may be possible to use this API for headless or server-to-server interactions where no human is present to observe progress, this is not a current goal. Headless scenarios create many questions like the launching of browsers and profile considerations. - **Autonomous agent workflows**: The API is not intended for fully autonomous agents operating without human oversight, or where a browser UI is not required. This task is likely better suited to existing protocols like [A2A](https://a2aproject.github.io/A2A/latest/). -- **Replacement of existing protocols**: Web Model Context works with existing protocols like MCP and is not a replacement of existing protocols. +- **Replacement of backend integrations**: WebMCP works with existing protocols like MCP and is not a replacement of existing protocols. +- **Replace human interfaces**: The human web interface remains primary; agent tools augment rather than replace user interaction. +- **Enable / influence discoverability of sites to agents** + +## Use Cases + +The use cases for script tools are ones in which the user is collaborating with the agent, rather than completely +delegating their goal to it. They can also be helpful where interfaces are highly specific or complicated. + +### Example - Creative + +_Jen wants to create an invitation to her upcoming yard sale so she uses her browser to navigate to +`http://easely.example`, her favorite graphic design platform. However, she's rather new to it and sometimes struggles +to find all the functionality needed for her task in the app's extensive menus. She creates a "yard sale flyer" design +and opens up a "templates" panel to look for a premade design she likes. There's so many templates and she's not sure +which to choose from so she asks her browser agent for help._ + +**Jen**: Show me templates that are spring themed and that prominently feature the date and time. They should be on a +white background so I don't have to print in color. + +_The current document has registered a script tool that the agent notices may be relevant to this query:_ + +```js +/** + * Filters the list of templates based on a description. + * + * description - A visual description of the types of templates to show, in natural language (English). + */ + filterTemplates(description) +``` + +_The agent invokes the tool: `filterTemplate("sprint themed, date and time displayed prominently, white background")`. +The UI updates to show a filtered list matching this description._ + +**Agent**: Ok, the remaining templates should now match your description. + +_Jen picks a template and gets to work._ -## Technical specification +_The agent notices a new tool was registered when the design was loaded:_ + +```js +/** + * Makes changes to the current design based on instructions. Possible actions include modifications to text + * and font; insertion, deletion, transformation of images; placement and scale of elements. The instructions + * should be limited a single task. Here are some examples: + + * editDesign("Change the title's font color to red"); + * editDesign("Rotate each picture in the background a bit to give the design a less symmetrical feel"); + * editDesign("Add a text field at the bottom of the design that reads 'example text'"); + * + * instructions - A description of how the design should be changed, in natural language (English). + */ + editDesign(instructions) +``` -### Definitions +_With all the context of Jen's prompts, page state, and this editDesign tool, the agent is able to make helpful +suggestions on next steps:_ -- **Model context provider**: A single top-level browsing context navigated to a page that uses the Web Model Context API to provide context (i.e. tools) to agents. -- **Agent**: An application that uses the provided context. This may be something like an AI assistant integrated into the browser, or possibly a native/desktop application. +**Agent**: Would you like me to make the time/date font larger? -### Understanding Web Model Context +**Jen**: Sure. Could you also swap out the clipart for something more yard-sale themed? -While the Web Model Context API can define tools for other uses, the API loosely aligns with MCP and tools defined with Web Model Context can be treated as MCP tools by MCP clients. +**Agent**: Sure, let me do that for you. -Only a top-level browsing context, such as a browser tab can be a model context provider. A page calls the Web Model Context API's `provideContext()` method to register model context with the browser, which are JavaScript methods with descriptions an AI agent can invoke as tools. When an agent that is connected to the page sends a tool call, a JavaScript callback is invoked, where the page can handle the tool call and respond to the agent. Simple applications can handle tool calls entirely in page script, but more complex applications may choose to delegate computationally heavy operations to workers and respond to the agent asynchronously. +**Jen**: Please fill in the time and place using my home address. The time should be in my e-mail in a message from my +husband. +**Agent**: Ok, I've found it - I'll fill in the flyer with Aug 5-8, 2025 from 10am-3pm | 123 Queen Street West. -Handling tool cools in the main thread with the option of workers serves a few purposes: +_Jen is almost happy with the current design but think the heading could be better_ -- Ensures tool calls run one at a time and sequentially. -- The page can update UI to reflect state changes performed by tools. -- Handling tool calls in page script may be sufficient for simple applications. +**Jen**: Help me come up with a more attention grabbing headline for the call to action and title. -### Benefits of this design +**Agent**: Of course! Here are some more attention-grabbing headlines for your yard sale flyer, broken down by title and +call to action: -- **Familiar language/tools**: Lets a web developer implement their tools in JavaScript. -- **Code reuse**: A web developer may only need to make minimal changes to expose existing functionality as tools if their page already has an appropriate JavaScript function. -- **Local tool call handling**: Enables web developers to integrate their pages with AI-based agents by working with, but not solely relying on, techniques like Model Context Protocol that require a separate server and authentication. A web developer may only need to maintain one codebase for their frontend UI and agent integration, improving maintainability and quality-of-life for the developer. Local handling also potentially reduces network calls and enhances privacy/security. -- **Fine-grained permissions**: Tool calls are mediated through the browser, so the user has the opportunity to review the requesting client apps and provide consent. -- **Developer involvement**: Encourages developer involvement in the agentic web, required for a thriving web. Reduces the need for solutions like UI automation where the developer is not involved, improving privacy, reducing site expenses, and a better customer experience. -- **Seamless integration**: Since tool calls are handled locally on a real browser, the agent can interleave these calls with human input when necessary (e.g. for consent, auth flows, dialogs, etc.). -- **Accessibility**: Bringing tools to webpages via may help users with accessibility needs by allowing them to complete the same job-to-be-done via agentic or conversational interfaces instead of relying on the accessibility tree, which many websites have not implemented. +To Create Excitement: + * Yard Sale Extravaganza! + * The Ultimate Clear-Out Sale + * Mega Garage & Yard Sale -### Limitations of this design +... -- **Browsing context required**: Since tool calls are handled in JavaScript, a browsing context (i.e. a browser tab or a webview) must be opened. There is currently no support for agents or assistive tools to call tools "headlessly" without visible browser UI. This is a future consideration which is discussed further below. -- **UI synchronization**: For a satisfactory end user experience, web developers need to ensure their UI is updated to reflect the current app state, regardless of whether the state updates came from human interaction or from a tool call. -- **Complexity overhead**: In cases where the site UI is very complex, developers will likely need to do some refactoring or add JavaScript that handles app and UI state with appropriate outputs. -- **Tool discoverability**: There is no built-in mechanism for client applications to discover which sites provide callable tools without visiting or querying them directly. Search engines, or directories of some kind may play a role in helping client applications determine whether a site has relevant tools for the task it is trying to perform. +**Jen**: Lets use "Yard Sale Extravaganza!" as the title. Create copies of this page with each of the call to action +suggestions. -### API +_The agent takes this action using a sequence of tool calls which might look something like:_ -The `window.agent` interface is introduced to represent an abstract AI agent that is connected to the page and uses the page's context. The `agent` object has a single method `provideContext` that's used to update the context (currently just tools) available to the agent. The method takes an object with a `tools` property which is a list of tool descriptors. The tool descriptors look as shown in this example below, which aligns with the Prompt API's [tool use](https://github.com/webmachinelearning/prompt-api#tool-use) specification, and other libraries like the MCP SDK: +* `EditDesign("Change the title text to 'Yard Sale Extravaganza!'")` +* `EditDesign("Change the call-to-action text to 'The hunt is on!'")` +* `AddPage("DUPLICATE")` +* `EditDesign("Change the call-to-action text to 'Ready, set, shop!'")` +* `AddPage("DUPLICATE")` +* `EditDesign("Change the call-to-action text to 'Come for the bargains, stay for the cookies'")` + +_Jen now has 3 versions of the same yard sale flyer. Easely implements these script tools using AI-based techinques on +their backend to allow a natural language interface. Additionally, the UI presents these changes to Jen as an easily +reversible batch of "uncommitted" changes, allowing her to easily review the agent's actions and make changes or undo as +necessary. While the site could also implement a chat interface to expose this functionality with their own agent, the +browser's agent provides a seamless journey by using tools across multiple sites/services. For example, pulling up +information from the user's email service._ + +**Agent**: Done! I've created three variations of the original design, each with a unique call to action. + +_Jen is now happy with these flyers. Normally she'd print to PDF and then take the file to a print shop. However, Easely +has a new print service that Jen doesn't know about and doesn't notice in the UI. However, the agent knows the page has +an `orderPrints` tool: ```js -// Declare tool schema and implementation functions. -window.agent.provideContext({ - tools: [ - { - name: "add-todo", - description: "Add a new todo item to the list", - inputSchema: { - type: "object", - properties: { - text: { type: "string", description: "The text of the todo item" } - }, - required: ["text"] - }, - async execute({ text }) => { - // Add todo item and update UI. - return /* structured content response */ - } - } - ] -}); +/** + * Orders the current design for printing and shiping to the user. + * + * copies - A number between 0 and 1000 indicating how many copies of the design to print. Required. + * page_size - The paper type to use. Available options are [Legal, Letter, A4, A5]. Default is "Letter". + * page_finish - What kind of paper finish to use. Available options are [Regular, Glosys Photo, Matte Photo]. + * Default is "Regular" + */ +orderPrints(copies, page_size, page_finish); ``` -The `provideContext` method can be called multiple times. Subsequent calls clear any pre-existing tools and other context before registering the new ones. This is useful for single-page web apps that frequently change UI state and could benefit from presenting different tools depending on which state the UI is currently in. +_The agent understands the user's intent and so surfaces a small chip in it's UI:_ -**Advantages:** +**Agent**: `` -- Aligns with existing APIs. -- Simple for web developers to use. -- Enforces a single function per tool. +_Jen is delighted she saved a trip to the store and clicks the button_. -**Disadvantages:** +**Agent**: How many copies would you like? I'll request 8.5x11 sized regular paper but there are other options available. -- Must navigate to the page and run JavaScript for agent to discover tools. +**Jen**: Please print 10 copies. -If Web Model Context gains traction in the web developer community, it will become important for agents to have a way to discover which sites have tools that are relevant to a user's request. Discovery is a topic that may warrant its own explainer, but suffice to say, it may be beneficial for agents to have a way to know what capabilities a page offers without having to navigate to the web site first. As an example, a future iteration of this feature could introduce declarative tools definitions that are placed in an app manifest so that agents would only need to fetch the manifest with a simple HTTP GET request. Agents will of course still need to navigate to the site to actually use its tools, but a manifest makes it far less costly to discover these tools and reason about their relevance to the user's task. +**Agent**: Done! The order is ready for your review. -To make such a scenario easier, it would be beneficial to support an alternate means of tool call execution; one that separates the tool defintion and schema (which may exist in an external manifest file) from the implementation function. +_The site navigates to the checkout page where Jen can review the order and click the checkout button to confirm._ -One way to do this is to handle tool calls as events, as shown below: -```json -// 1. manifest.json: Define tools declaratively. Exact syntax TBD. +### Example - Shopping + +Shopping often includes exploration, browsing, and reviewing. The agent can help and remove the tedious parts but the +user wants to be involved. + +_Maya has a few favorite clothing vendors but doesn't get formal clothes very often so she opens her AI assistant_ + +**Maya**: Suggest a few formal or semi-formal dress places where I can shop for my friend’s wedding. Please make sure +they are ecofriendly and not too expensive. + +**Agent**: Sure, here's a few suggestions + +1. Elara - uses innovative recycled and plant-based fabrics to minimize waste for a modern, chic wardrobe that allows + you to embrace the latest trends, responsibly. Price Range: Dresses typically \$90 - \$220 CAD + +2. Linden Harbour - creates wonderfully crafted womenswear designed to last a lifetime. Carries everything from classic +shirt dresses to elegant knitwear, meticulously made from the finest organic cotton, linen, and responsibly sourced +wool. Price Range: Dresses typically \$250 - \$550 CAD + +3. Wildebloom - Flowing dresses brought to life through artisanal collaborations, using natural plant dyes and panels of + upcycled vintage textiles. Price Range: Dresses typically \$180 - \$450 CAD + +**Maya**: Lets take a look at Wildebloom. + +_The agent now opens Maya's web browser, which it is integrated with. i.e. the agent can observe and control the +browser. It navigates to the dresses page on `http://wildebloom.example/shop`_ + +**Agent**: Ok, here are the dresses that Wildebloom carries. + +_Maya is immediately overwhelmed. There are so many options! Moreover, when she looks at filters she sees they're +quite limited with only colour and size as options._ + +**Maya**: Show me only dresses available in my size, and also show only the ones that would be appropriate for a +cocktail-attire wedding. +_The agent notices the dresses page registers several tools:_ + +```js +/* + * Returns an array of product listings containing an id, detailed description, price, and photo of each + * product + * + * size - optional - a number between 2 and 14 to filter the results by EU dress size + * size - optional - a color from [Red, Blue, Green, Yellow, Black, White] to filter dresses by +getDresses(size, color) + +/* + * Displays the given products to the user + * + * product_ids - An array of numbers each of which is a product id returned from getDresses + */ +showDresses(product_ids) +``` + +_The agent calls `getDresses(6)` and receives a JSON object:_ + +```json { - // .. other manifest fields .. - "tools": [ + products: [ { - "name": "add-todo", - "description": "Add a new todo item to the list", - "inputSchema": { - "type": "object", - "properties": { - "text": { "type": "string", "description": "The text of the todo item" } - }, - "required": ["text"] - }, - } + id: 1021, + description: "A short sleeve long dress with full length button placket...", + price: "€180", + image: "img_1024.png", + }, + { + id: 4320, + description: "A straight midi dress in organic cotton...", + price: "€140", + image: "img_4320.png", + }, + ... ] } ``` -```js -// 2. script.js: Handle tool calls as events. - -window.agent.addEventListener('toolcall', async e => { - if (e.name === "add-todo") { - // Add todo item and update UI. - e.respondWith(/* structured content response */); - return; - } // etc... -}); -``` +> [!Note] +> How to pass images and other non-textual data is something we should improve +> Issue #10 + +_The agent can now process this list, fetching each image, and using the user's criteria to filter the list. When +completed it makes another call, this time to `showDresses([4320, 8492, 5532, ...])`. This call updates the UI on the +page to show only the requested dresses._ + +_This is still too many dresses so Maya finds an old photo of herself in a summer dress that she really likes and shares +it with her agent._ -Tool calls are handled as events. Since event handler functions can't respond to the agent by returning a value directly, the `'toolcall'` event object has a `respondWith()` method that needs to be called to signal completion and respond to the agent. This is based on the existing service worker `'fetch'` event. +**Maya**: Are there any dresses similar to the dress worn in this photo? Try to match the colour and style, but continue +to show me dresses appropriate for cocktail-attire. -**Advantages:** +_The agent uses this image to identify several new parameters including: the colour, the fit, and the neckline and +narrows down the list to just a few dresses. Maya finds and clicks on a dress she likes._ -- Allows additional context different discovery mechanisms without rendering a page. +_Notice, the user did not give their size, but the agent knows this from personalization and may even translate the stored +size into EU units to use it with this site._ -**Disadvantages:** +### Example - Code Review -- Slightly harder to keep definition and implementation in sync. -- Potentially large switch-case in event handler. +Some services are very domain specific and/or provide a lot of functionality. A real world example is the Chromium code +review tool: Gerrit. See [CL#5142508](crrev.com/c/5142508). Gerrit has many features but they're not obvious just by +looking at the UI (you can press the '?' key to show a shortcut guide). In order to add a comment to a line, the user +must know to press the 'c' key. The user can suggest edits but has to open a comment to do so. Results from test runs +are available but are hidden in a generically-named "Checks" tab. -#### Recommendation +Agents are typically trained on everyday usage so may do a poor job on more specialized, complex interfaces. However, +such sites could provide the agent with tools which serve as both a shortcut and a user manual for the agent. -A **hybrid** approach of both of the examples above is recommended as this would make it easy for web developers to get started adding tools to their page, while leaving open the possibility of manifest-based approaches in the future. To implement this hybrid approach, a `"toolcall"` event is dispatched on every incoming tool call _before_ executing the tool's `execute` function. The event handler can handle the tool call by calling the event's `preventDefault()` method, and then responding to the agent with `respondWith()` as shown above. If the event handle does not call `preventDefault()` then the browser's default behavior for tool calls will occur. The `execute` function for the requested tool is called. If a tool with the requested name does not exist, then the browser responds to the agent with an error. +_John is a software developer and opens a code review sent from his colleague. He notices there's two red bots +indicating test failures on this patch._ -#### Other API alternatives considered +**John**: Why are the Mac and Android bots failing? -An earlier version of this explainer that was not published considered supporting multiple agents connecting to a single page and using a Session object to represent a connection between a single agent and the page. The page would register a handler to be notified when a new agent connected and would be able to see basic info about the agent and provide different tools to different agents if needed: +_The site includes some relevant tools:_ ```js -window.registerContextProvider({ name: "Example App" }, session => { - console.log("Agent name: " + session.clientInfo.name); +/** + * Returns the status of each bot run in a try run job. Use the returned bot_name in getTryRunFailureSnippet + * to get more details about a run. + */ +getTryRunStatuses(); + +/** + * If a bot failed, returns the TAIL snippet of the log containing the error message describing the failure. + * + * bot_name - The name of the bot, as returned from getTryRunStatuses, to get failure snippet from + */ +getTryRunFaliureSnippet(bot_name) +``` + +_The agent calls `getTryRunStatuses()` to find the names of the failing bots. It returns:_ - session.provideContext({ - tools: [ /.../ ] - }); -}); +```json +[ + { + "bot_name": "mac-x86-rel", + "status": "PASSED" + }, + { + "bot_name": "mac-x64-rel", + "status": "FAIL" + }, + { + "bot_name": "android-15-rel", + "status": "PASSED" + }, + ... +] ``` -This approach was abandoned as supporting multiple agents at a time introduced unneeded complexity. Exposing information about the client agent was also of limited use to the page since there was no way for the page to verify the client's identity. -## Example of Web Model Context API usage +**Agent**: I see that the Mac x86 Android 10 bots are failing. Let me get more details... -Consider a web application like an example Historical Stamp Database. The complete source is available in the [example/](./example/index.html) folder alongside this explainer. +The agent now calls `getTryRunFailureSnippet` on each of the failing bots and receives a snippet of the failing log of +each._ -Screenshot of Historical Stamp Database +**Agent**: -The page shows the stamps currently in the database and has a form to add a new stamp to the database. The author of this app is interested in leveraging the Web Model Context API to enable agentic scenarios like: +The Mac bot is failing to start with an "Out of Space" error. This is probaby just an ifrastructure issue. -- Importing multiple stamps from outside data sources -- Back-filling missing images -- Populating/correcting descriptions with deep research -- Adding information to descriptions about rarity -- Allowing end users to engage in a conversational interface about the stamps on the site and use that information in agentic flows +The Android bot is failing while linking with a missing symbol `gfx::DisplayCompositor`. -sing the Web Model Context API, the author can add just a few simple tools to the page for adding, updating, and retrieving stamps. With these relatively simple tools, an AI agent would have the ability to perform complex tasks like the ones illustrated above on behalf of the user. +_John realizes his colleague forgot to add the Android implementation of this class to the build file._ -The example below walks through adding one such tool, the "add-stamp" tool, using the Web Model Context API, so that AI agents can update the stamp collection. +**John**: I think BUILD.gn is missing `display_compositor_android.cc`, please add a suggested edit adding it in the +Android section. -The webpage today is designed with a visual UX in mind. It uses simple JavaScript with a `'submit'` event handler that reads the form fields, adds the new record, and refreshes the UI: +_The agent has access to BUILD.gn from the main UI as well as in DOM. It notices it also has access to a relevant tool +which is uses to add the change:_ ```js -document.getElementById('addStampForm').addEventListener('submit', (event) => { - event.preventDefault(); +/* + * Adds a suggested edit to the review + * + * filename - the name of the file in which to make the edit + * patch - the edit diff in unidiff format + */ +addSuggestedEdit(filename, patch) +``` - const stampName = document.getElementById('stampName').value; - const stampDescription = document.getElementById('stampDescription').value; - const stampYear = document.getElementById('stampYear').value; - const stampImageUrl = document.getElementById('stampImageUrl').value; +**Agent**: Ok, I've added `display_compositor_android.cc` to the BUILD.gn file. - addStamp(stampName, stampDescription, stampYear, stampImageUrl); -}); -``` +_The UI displays the suggested diff with an option for the user to accept, modify, or reject the change. John accepts +the change._ -To facilitate code reuse, the developer has already extracted the code to add a stamp and refresh the UI into a helper function `addStamp()`: +_Reading the rest of the review, John notices a small issue repeated across multiple files._ -```js -function addStamp(stampName, stampDescription, stampYear, stampImageUrl) { - // Add the new stamp to the collection - stamps.push({ - name: stampName, - description: stampDescription, - year: stampYear, - imageUrl: stampImageUrl || null - }); - - // Confirm addition and update the collection - document.getElementById('confirmationMessage').textContent = `Stamp "${stampName}" added successfully!`; - renderStamps(); -} -``` +**John**: Add a polite comment to the review that we should use "PointF" rather than "Point" for input coordinates since +the latter can cause unintended rounding. Then add suggested edits changing all instances where Point was added to +PointF. -To let AI agents use this functionality, the author defines the available tools. The `agent` property on the `Window` is checked to ensure the browser supports Web Model Context. If supported, the `provideContext()` method is called, passing in an array of tools with a single item, a definition for the new "Add Stamp" tool. The tool accepts as parameters the same set of fields that are present in the HTML form, since this tool and the form should be functionally equivalent. +_The agent automates the repetitive task of making all the simple changes. The UI provides John with a visual way to +quickly review the agent's actions and accept/modify/reject them._ -```js -if ("agent" in window) { - window.agent.provideContext({ - tools: [ - { - name: "add-stamp", - description: "Add a new stamp to the collection", - inputSchema: { - type: "object", - properties: { - name: { type: "string", description: "The name of the stamp" }, - description: { type: "string", description: "A brief description of the stamp" }, - year: { type: "number", description: "The year the stamp was issued" }, - imageUrl: { type: "string", description: "An optional image URL for the stamp" } - }, - required: ["name", "description", "year"] - }, - async execute({ name, description, year, imageUrl }) { - // TODO - } - } - ] - }); -} -``` +## Assumptions -Now the author needs to implement the tool. The tool needs to update the stamp database, and refresh the UI to reflect the change to the database. Since the code to do this is already available in the `addStamp()` function written earlier, the tool implementation is very simple and just needs to call this helper when an "add-stamp" tool call is received. After calling the helper, the tool needs to signal completion and should also provide some sort of feedback to the client application that requested the tool call. It returns a text message indicating the stamp was added: +* For many sites wanting to integrate with agents quickly - augmenting their existing UI with script tools will be + easier vs. backend integration +* Agents will perform quicker and more successfully with specific tools compared to using a human interface. +* Users might use an agent for a direct action query (e.g. “create a 30 minute meeting with Pat at 3:00pm”), complex + cross-site queries (e.g. “Find the 5 highest rated restaurants in Toronto, pin them in my Map, and book a table at + each one over the next 5 weeks”) and everything in between. -```js -async execute({ name, description, year, imageUrl }) { - addStamp(name, description, year, imageUrl); - - return { - content: [ - { - type: "text", - text: `Stamp "${name}" added successfully! The collection now contains ${stamps.length} stamps.`, - }, - ] - }; -} -``` -### Future improvements to this example +## Prior Art -#### Use a worker +### Model Context Protocol (MCP) -To improve the user experience and make it possible for the stamp application to handle a large number of tool calls without tying up the document's main thread, the web developer may choose to move the tool handling into a dedicated worker script. Handling tool calls in a worker keeps the UI responsive, and makes it possible to handle potentially long-running operations. For example, if the user asks an AI agent to add a list of hundreds of stamps from an external source such as a spreadsheet, this will result in hundreds of tool calls. +MCP is a protocol for applications to interface with an AI model. Developed by Anthropic, MCP is supported by Claude +Desktop and Open AI's Agents SDK as well as a growing ecosystem of clients and servers. -#### Adaptive UI +In MCP, an application can expose tools, resources, and more to an AI-enabled application by implementing an MCP server. +The server can be implemented in various languages, as long as it conforms to the protocol. For example, here’s an +implementation of a tool using the Python SDK from the MCP quickstart guide: -The author may also wish to change the on-page user experience when a client is connected. For example, if the user is interacting with the page primarily through an AI agent or assistive tool, then the author might choose to disable or hide the HTML form input and use more of the available space to show the stamp collection. +```python +@mcp.tool() +async def get_alerts(state: str) -> str: + """Get weather alerts for a US state. -## Open topics for Web Model Context + Args: + state: Two-letter US state code (e.g. CA, NY) + """ + url = f"{NWS_API_BASE}/alerts/active/area/{state}" + data = await make_nws_request(url) -### Security considerations + if not data or "features" not in data: + return "Unable to fetch alerts or no alerts found." -There are security considerations that will need to be accounted for, especially if the Web Model Context API is used by semi-autonomous systems like LLM-based agents. Engagement from the community is welcome. + if not data["features"]: + return "No active alerts for this state." -### Model poisoning + alerts = [format_alert(feature) for feature in data["features"]] + return "\n---\n".join(alerts) +``` -Explorations should be made on the potential implications of allowing web developers to create tools in their front-end code for use in AI agents and LLMs. For example, vulnerabilities like being able to access content the user would not typically be able to see will need to be investigated. +A client application implements a matching MCP client which takes a user’s query, communicates with one or more MCP +servers to enumerate their capabilities, and constructs a prompt to the AI platform, passing along any server-provided +tools or data. -### Cross-Origin Isolation +The MCP protocol defines how this client-server communication happens. For example, a client can ask the server to list +all tools which might return a response like this: -Client applications would have access to many different web sites that expose tools. Consider an LLM-based agent. It is possible and even likely that data output from one application's tools could find its way into the input parameters for a second application's tool. There are legitimate reasons for the user to want to send data across origins to achieve complex tasks. Care should be taken to indicate to the user which web applications are being invoked and with what data so that the user can intervene. +```json +{ + "jsonrpc": "2.0", + "id": 1, + "result": { + "tools": [ + { + "name": "get_weather", + "description": "Get current weather information for a location", + "inputSchema": { + "type": "object", + "properties": { + "location": { + "type": "string", + "description": "City name or zip code" + } + }, + "required": ["location"] + } + } + ], + "nextCursor": "next-page-cursor" + } +} +``` -### Permissions +Unlike OpenAPI, MCP is transport-agnostic. It comes with two built in transports: stdio which uses the systems standard +input/output, well suited for local communication between apps, and Server-Sent Events (SSE) which uses HTTP commands +for remote execution. + +### WebMCP (MCP-B) + +[MCP-B](https://mcp-b.ai/), or Model Context Protocol for the Browser, is an open source project found on GitHub [here](https://github.com/MiguelsPizza/WebMCP) and has much the same motivation and solution as described in this proposal. MCP-B's underlying protocol, also named WebMCP, extends MCP with tab transports that allow in-page communicate between a website's MCP server and any client in the same tab. It also extends MCP with extension transports that use Chromium's runtime messaging to make a website's MCP server available to other extension components within the browser (background, sidebar, popup), and to other external MCP clients running on the same machine. MCP-B enables tools from different sites to work together, and for sites to cache tools so that they are discoverable even if the browser isn't currently navigated to the site. + +### OpenAPI + +OpenAPI is a standard for describing HTTP based APIs. Here’s an example in YAML (from the ChatGPT Actions guide): + +```yaml +openapi: 3.1.0 +info: + title: NWS Weather API + description: Access to weather data including forecasts, alerts, and observations. + version: 1.0.0 +servers: + - url: https://api.weather.gov + description: Main API Server +paths: + /points/{latitude},{longitude}: + get: + operationId: getPointData + summary: Get forecast grid endpoints for a specific location + parameters: + - name: latitude + in: path + required: true + schema: + type: number + format: float + description: Latitude of the point + - name: longitude + in: path + required: true + schema: + type: number + format: float + description: Longitude of the point + responses: + '200': + description: Successfully retrieved grid endpoints + content: + application/json: + schema: + type: object + properties: + properties: + type: object + properties: + forecast: + type: string + format: uri + forecastHourly: + type: string + format: uri + forecastGridData: + type: string + format: uri +``` -A trust boundary is crossed both when a web site first registers as a model context provider, and when a new client agent wants to use this context (e.g. by calling a tool). When a web site registers tools, it exposes information about itself and the services it provides to the host environment (i.e. the browser). When agents send tool calls, the site receives untrusted input in the parameters and the outputs in turn may contain sensitive user information. The browser should prompt the user at both points to grant permission and also provide a means to see what information is being sent to and from the site when a tool is called. To streamline workflows, browsers may give users the choice to always allow tool calls for a specific web app and client app pair. +A subset of the OpenAPI specification is used for function-calling / tool use for various AI platforms, such as ChatGPT +Actions and Gemini Function Calling. A user or developer on the AI platform would provide the platform with the OpenAPI +schema for an API they wish to provide as a “tool”. The AI is trained to understand this schema and is able to select +the tool and output a “call” to it, providing the correct arguments. Typically, some code external to the AI itself +would be responsible for making the API call and passing the returned result back to the AI’s conversation context to +reply to the user’s query. -## Other Alternatives considered +### Agent2Agent Protocol -### Web App Manifest, other Manifest, or declarative +The Agent2Agent Protocol is another protocol for communication between agents. While similar in structure to MCP (client +/ server concepts that communicate via JSON-RPC), A2A attempts to solve a different problem. MCP (and OpenAPI) are +generally about exposing traditional capabilities to AI models (i.e. “tools”), A2A is a protocol for connecting AI +agents to each other. It provides some additional features to make common tasks in this scenario more streamlined, such +as: capability advertisement, long running and multi-turn interactions, and multimodal input/output. -We considered declaring tools statically in a site's Web App Manifest. Declaring tools solely in the Web App Manifest limits Web Model Context to PWAs which could impact adoption since users would need to install a site as an app for tools to be available. +## Open topics -Another type of manifest could be proposed but using this approach also means that only a fixed set of static tools are available and can't be updated dynamically based on application state, which seems like an important ability for web developers. Since manifests can't execute code, it also means manifests are additional work for the developer since they will need to still implement the tool somewhere. +### Security considerations -Our recommended approach above allows for the possibility of declarative tools in the future while giving web developers as much control as possible by defining tools in script. +There are security considerations that will need to be accounted for, especially if the WebMCP API is used by semi-autonomous systems like LLM-based agents. Engagement from the community is welcome. -### Handling tool calls in worker threads +### Model poisoning -Handling tool calls on the main thread raises performance concerns, especially if an agent requests a large amount of tool calls in sequence, and/or the tools are computationally expensive. A design alternative that required tool calls to be handled in workers was considered instead. +Explorations should be made on the potential implications of allowing web developers to create tools in their front-end code for use in AI agents and LLMs. For example, vulnerabilities like being able to access content the user would not typically be able to see will need to be investigated. -One proposal was to expose the Web Model Context API only in service workers and let the service worker post messages to individual client windows/tabs as needed in order to update UI. This would have complicated the architecture and required web developers to add a service worker. This would also have required the Session concept described earlier to help the service worker differentiate between agents that are connected to different windows and dispatch requests from a particular agent to the correct window. +### Cross-Origin Isolation + +Client applications would have access to many different web sites that expose tools. Consider an LLM-based agent. It is possible and even likely that data output from one application's tools could find its way into the input parameters for a second application's tool. There are legitimate reasons for the user to want to send data across origins to achieve complex tasks. Care should be taken to indicate to the user which web applications are being invoked and with what data so that the user can intervene. -For long-running, batched, or expensive tool calls, we expect web developers will dynamically update their UI when these are taking place to temporarily cede control to the agent (e.g. disable or remove human form inputs, indicate via UI that an agent is in control), and take advantage of dedicated workers as needed to offload expensive operations. This can be achieved with existing dedicated or shared workers. +### Permissions + +A trust boundary is crossed both when a web site first registers tools via WebMCP, and when a new client agent wants to use these tools. When a web site registers tools, it exposes information about itself and the services it provides to the host environment (i.e. the browser). When agents send tool calls, the site receives untrusted input in the parameters and the outputs in turn may contain sensitive user information. The browser should prompt the user at both points to grant permission and also provide a means to see what information is being sent to and from the site when a tool is called. To streamline workflows, browsers may give users the choice to always allow tool calls for a specific web app and client app pair. -### Model Context Protocol (MCP) without Web Model Context +### Model Context Protocol (MCP) without WebMCP -MCP has quickly garnered wide interest from the developer community, with hundreds of MCP servers being created. Web Model Context API is designed to work well with MCP, so that developers can reuse many of the MCP topics with their front-end website using JavaScript. We originally planned to propose an explainer very tightly aligned with MCP, providing all the same concepts supported by MCP at the time of writing, including tools, resources, and prompts. Since MCP is still actively changing, matching its exact capabilities would be an ongoing effort. Aligning the Web Model Context API tightly with MCP would also make it more difficult to tailor Web Model Context for non-LLM scenarios like OS and accessibility assistant integrations. Keeping the Web Model Context API as agnostic as possible increases the chance of it being useful to a broader range of potential clients. +MCP has quickly garnered wide interest from the developer community, with hundreds of MCP servers being created. WebMCP is designed to work well with MCP, so that developers can reuse many of the MCP topics with their front-end website using JavaScript. We originally planned to propose an explainer very tightly aligned with MCP, providing all the same concepts supported by MCP at the time of writing, including tools, resources, and prompts. Since MCP is still actively changing, matching its exact capabilities would be an ongoing effort. Aligning the WebMCP API tightly with MCP would also make it more difficult to tailor WebMCP for non-LLM scenarios like OS and accessibility assistant integrations. Keeping the WebMCP API as agnostic as possible increases the chance of it being useful to a broader range of potential clients. -We expect some web developers will continue to prefer standalone MCP instead of Web Model Context if they want to have an always-on MCP server running that does not require page navigation in a full browser process. For example, server-to-server scenarios such as fully autonomous agents will likely benefit more from MCP servers. Web Model Context is best suited for local browser workflows with a human in the loop. +We expect some web developers will continue to prefer standalone MCP instead of WebMCP if they want to have an always-on MCP server running that does not require page navigation in a full browser process. For example, server-to-server scenarios such as fully autonomous agents will likely benefit more from MCP servers. WebMCP is best suited for local browser workflows with a human in the loop. -The Web Model Context API still maps nicely to MCP, and exposing context provided via Web Model Context to external applications via an MCP server is still a useful scenario that a browser implementation may wish to enable. +The WebMCP API still maps nicely to MCP, and exposing WebMCP tools to external applications via an MCP server is still a useful scenario that a browser implementation may wish to enable. ### Existing web automation techniques (DOM, accessibility tree) -One of the scenarios we want to enable is making the web more accessible to general-purpose AI-based agents. In the absence of alternatives like MCP servers to accomplish their goals, these general-purpose agents often rely on observing the browser state through a combination of screenshots, and DOM and accessibility tree snapshots, and then interact with the page by simulating human user input. We believe that Web Model Context will give these tools an alternative means to interact with the web that give the web developer more control over whether and how an AI-based agent interacts with their site. +One of the scenarios we want to enable is making the web more accessible to general-purpose AI-based agents. In the absence of alternatives like MCP servers to accomplish their goals, these general-purpose agents often rely on observing the browser state through a combination of screenshots, and DOM and accessibility tree snapshots, and then interact with the page by simulating human user input. We believe that WebMCP will give these tools an alternative means to interact with the web that give the web developer more control over whether and how an AI-based agent interacts with their site. -The proposed API will not conflict with these existing automation techniques. If an agent or assistive tool finds that the task it is trying to accomplish is not achievable through the Web Model Context tools that the page provides, then it can fall back to general-purpose browser automation to try and accomplish its task. +The proposed API will not conflict with these existing automation techniques. If an agent or assistive tool finds that the task it is trying to accomplish is not achievable through the WebMCP tools that the page provides, then it can fall back to general-purpose browser automation to try and accomplish its task. ## Future explorations ### Progressive web apps (PWA) -PWAs should also be able to use the Web Model Context API as described in this proposal. There are potential advantages to installing a site as a PWA. In the current proposal, tools are only discoverable once a page has been navigated to and only persist for the lifetime of the page. A PWA with an app manifest could declare tools that are available "offline", that is, even when the PWA is not currently running. The host system would then be able to launch the PWA and navigate to the appropriate page when a tool call is requested. +PWAs should also be able to use the WebMCP API as described in this proposal. There are potential advantages to installing a site as a PWA. In the current proposal, tools are only discoverable once a page has been navigated to and only persist for the lifetime of the page. A PWA with an app manifest could declare tools that are available "offline", that is, even when the PWA is not currently running. The host system would then be able to launch the PWA and navigate to the appropriate page when a tool call is requested. ### Background model context providers -Some tools that a web app may want to provide for agents and assistive technologies may not require any web UI. For example, a web developer building a "To Do" application may want to expose a tool that adds an item to the user's todo list without showing a browser window. The web developer may be content to just show a notification that the todo item was added. -For scenarios like this, it may be helpful to combine tool call handling with something like the ['launch'](https://github.com/WICG/web-app-launch/blob/main/sw_launch_event.md) event. A client application might attach a tool call to a "launch" request which is handled entirely in a service worker without spawning a browser window. +Some tools that a web app may want to provide for agents and assistive technologies may not require any web UI. For example, a web developer building a "To Do" application may want to expose a tool that adds an item to the user's todo list without showing a browser window. The web developer may be content to just show a notification that the todo item was added. + +For scenarios like this, it may be helpful to combine tool call handling with something like the ['launch'](https://github.com/WICG/web-app-launch/blob/main/sw_launch_event.md) event. A client application might attach a tool call to a "launch" request which is handled entirely in a service worker without spawning a browser window. \ No newline at end of file diff --git a/docs/webmcp.md b/docs/webmcp.md deleted file mode 100644 index d33d75d..0000000 --- a/docs/webmcp.md +++ /dev/null @@ -1,609 +0,0 @@ -# WebMCP - -_Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows._ - -## TL;DR - -We propose a new JavaScript interface that allows web developers to expose their web application functionality as "tools" - JavaScript functions with natural language descriptions and structured schemas that can be invoked by AI agents, browser assistants, and assistive technologies. Web pages that use WebMCP can be thought of as [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control. - -For the technical details of the proposal, code examples, API shape, etc. see [proposal.md](proposal.md). - -## Terminology Used - -###### Agent -An autonomous assistant that can understand a user's goals and take actions on the user's behalf to achieve them. Today, -these are typically implemented by large language model (LLM) based AI platforms, interacting with users via text-based -chat interfaces. - -###### Browser's Agent -An autonomous assistant as described above but provided by or through the browser. This could be an agent built directly -into the browser or hosted by it, for example, via an extension or plug-in. - -###### AI Platform -Providers of agentic assistants such as OpenAI's ChatGPT, Anthropic's Claude, or Google's Gemini. - -###### Backend Integration -A form of API integration between an AI platform and a third-party service in which the AI platform can talk directly to -the service's backend servers without a UI or running code in the client. For example, the AI platform communicating with -an MCP server provided by the service. - -###### Actuation -An agent interacting with a web page by simulating user input such as clicking, scrolling, typing, etc. - -## Background and Motivation - -The web platform's ubiquity and popularity have made it the world's gateway to information and capabilities. Its ability to support complex, interactive applications beyond static content, has empowered developers to build rich user experiences and applications. These user experiences rely on visual layouts, mouse and touch interactions, and visual cues to communicate functionality and state. - -As AI agents become more prevalent, the potential for even greater user value is within reach. AI platforms such as Copilot, ChatGPT, Claude, and Gemini are increasingly able to interact with external services to perform actions such as checking local weather, finding flight and hotel information, and providing driving directions. These functions are provided by external services that extend the AI model’s capabilities. These extensions, or “tools”, can be used by an AI to provide domain-specific functionality that the AI cannot achieve on its own. Existing tools integrate with each AI platform via bespoke “integrations” - each service registers itself with the chosen platform(s) and the platform communicates with the service via an API (MCP, OpenAPI, etc). In this document, we call this style of tool a “backend integration”; users make use of the tools/services by chatting with an AI, the AI platform communicates with the service on the user's behalf. - -Much of the challenges faced by assistive technologies also apply to AI agents that struggle to navigate existing human-first interfaces when agent-first "tools" are not available. Even when agents succeed, simple operations often require multiple steps and can be slow or unreliable. - -The web needs web developer involvement to thrive. What if web developers could easily provide their site's capabilities to the agentic web to engage with their users? We propose WebMCP, a JavaScript API that allows developers to define tools for their webpage. These tools allow for code reuse with frontend code, maintain a single interface for users and agents, and simplify auth and state where users and agents are interacting in the same user interface. Such an API would also be a boon for accessibility tools, enabling them to offer users higher-level actions to perform on a page. This would mark a significant step forward in making the web more inclusive and actionable for everyone. - -AI agents can integrate in the backend via protocols like MCP in order to fulfill a user's task. For a web developer to expose their site's functionality this way, they need to write a server, usually in Python or NodeJS, instead of frontend JS which may be more familiar. - -There are several advantages to using the web to connect agents to services: - -* **Businesses near-universally already offer their services via the web.** - - WebMCP allows them to leverage their existing business logic and UI, providing a quick, simple, and incremental - way to integrate with agents. They don't have to re-architect their product to fit the API shape of a given agent. - This is especially true when the logic is already heavily client-side. - - -* **Enables visually rich, cooperative interplay between a user, web page, and agent with shared context.** - - Users often start with a vague goal which is refined over time. Consider a user browsing for a high-value purchase. - The user may prefer to start their journey on a specific page, ask their agent to perform some of the more tedious - actions ("find me some options for a dress that's appropriate for a summer wedding, preferably red or orange, short - or no sleeves and no embelishments"), and then take back over to browse among the agent-selected options. - -* **Allows authors to serve humans and agents from one source** - - The human-use web is not going away. Integrating agents into it prevents fragmentation of their service and allows - them to keep ownership of their interface, branding and connection with their users. - -WebMCP is a proposal for a web API that enables web pages to provide agent-specific paths in their UI. With WebMCP, agent-service interaction takes place _via app-controlled UI_, providing a shared context available to app, agent, and user. In contrast to backend integrations, WebMCP tools are available to an agent only once it has loaded a page and they execute on the client. Page content and actuation remain available to the agent (and the user) but the agent also has access to tools which it can use to achieve its goal more directly. - -![A diagram showing an agent communicating with a third-party service via script tools running in a live web page](../content/explainer_st.svg) - -In contrast, in a backend integration, the agent-service interaction takes place directly, without an associated UI. If -a UI is required it must be provided by the agent itself or somehow connected to an existing UI manually: - -The expected flow using browser agents and Script Tools: - -![A diagram showing an agent communicating with a third-party service directl via MCP](../content/explainer_mcp.svg) - -## Goals - -- **Enable human-in-the-loop workflows**: Support cooperative scenarios where users work directly through delegating tasks to AI agents or assistive technologies while maintaining visibility and control over the web page(s). -- **Simplify AI agent integration**: Enable AI agents to be more reliable and helpful by interacting with web sites through well-defined JavaScript tools instead of through UI actuation. -- **Minimize developer burden**: Any task that a user can accomplish through a page's UI can be made into a tool by re-using much of the page's existing JavaScript code. -- **Improve accessibility**: Provide a standardized way for assistive technologies to access web application functionality beyond what's available through traditional accessibility trees which are not widely implemented. - -## Non-Goals - -- **Headless browsing scenarios**: While it may be possible to use this API for headless or server-to-server interactions where no human is present to observe progress, this is not a current goal. Headless scenarios create many questions like the launching of browsers and profile considerations. -- **Autonomous agent workflows**: The API is not intended for fully autonomous agents operating without human oversight, or where a browser UI is not required. This task is likely better suited to existing protocols like [A2A](https://a2aproject.github.io/A2A/latest/). -- **Replacement of backend integrations**: WebMCP works with existing protocols like MCP and is not a replacement of existing protocols. -- **Replace human interfaces**: The human web interface remains primary; agent tools augment rather than replace user interaction. -- **Enable / influence discoverability of sites to agents** - -## Use Cases - -The use cases for script tools are ones in which the user is collaborating with the agent, rather than completely -delegating their goal to it. They can also be helpful where interfaces are highly specific or complicated. - -### Example - Creative - -_Jen wants to create an invitation to her upcoming yard sale so she uses her browser to navigate to -`http://easely.example`, her favorite graphic design platform. However, she's rather new to it and sometimes struggles -to find all the functionality needed for her task in the app's extensive menus. She creates a "yard sale flyer" design -and opens up a "templates" panel to look for a premade design she likes. There's so many templates and she's not sure -which to choose from so she asks her browser agent for help._ - -**Jen**: Show me templates that are spring themed and that prominently feature the date and time. They should be on a -white background so I don't have to print in color. - -_The current document has registered a script tool that the agent notices may be relevant to this query:_ - -```js -/** - * Filters the list of templates based on a description. - * - * description - A visual description of the types of templates to show, in natural language (English). - */ - filterTemplates(description) -``` - -_The agent invokes the tool: `filterTemplate("sprint themed, date and time displayed prominently, white background")`. -The UI updates to show a filtered list matching this description._ - -**Agent**: Ok, the remaining templates should now match your description. - -_Jen picks a template and gets to work._ - -_The agent notices a new tool was registered when the design was loaded:_ - -```js -/** - * Makes changes to the current design based on instructions. Possible actions include modifications to text - * and font; insertion, deletion, transformation of images; placement and scale of elements. The instructions - * should be limited a single task. Here are some examples: - - * editDesign("Change the title's font color to red"); - * editDesign("Rotate each picture in the background a bit to give the design a less symmetrical feel"); - * editDesign("Add a text field at the bottom of the design that reads 'example text'"); - * - * instructions - A description of how the design should be changed, in natural language (English). - */ - editDesign(instructions) -``` - -_With all the context of Jen's prompts, page state, and this editDesign tool, the agent is able to make helpful -suggestions on next steps:_ - -**Agent**: Would you like me to make the time/date font larger? - -**Jen**: Sure. Could you also swap out the clipart for something more yard-sale themed? - -**Agent**: Sure, let me do that for you. - -**Jen**: Please fill in the time and place using my home address. The time should be in my e-mail in a message from my -husband. - -**Agent**: Ok, I've found it - I'll fill in the flyer with Aug 5-8, 2025 from 10am-3pm | 123 Queen Street West. - -_Jen is almost happy with the current design but think the heading could be better_ - -**Jen**: Help me come up with a more attention grabbing headline for the call to action and title. - -**Agent**: Of course! Here are some more attention-grabbing headlines for your yard sale flyer, broken down by title and -call to action: - -To Create Excitement: - * Yard Sale Extravaganza! - * The Ultimate Clear-Out Sale - * Mega Garage & Yard Sale - -... - -**Jen**: Lets use "Yard Sale Extravaganza!" as the title. Create copies of this page with each of the call to action -suggestions. - -_The agent takes this action using a sequence of tool calls which might look something like:_ - -* `EditDesign("Change the title text to 'Yard Sale Extravaganza!'")` -* `EditDesign("Change the call-to-action text to 'The hunt is on!'")` -* `AddPage("DUPLICATE")` -* `EditDesign("Change the call-to-action text to 'Ready, set, shop!'")` -* `AddPage("DUPLICATE")` -* `EditDesign("Change the call-to-action text to 'Come for the bargains, stay for the cookies'")` - -_Jen now has 3 versions of the same yard sale flyer. Easely implements these script tools using AI-based techinques on -their backend to allow a natural language interface. Additionally, the UI presents these changes to Jen as an easily -reversible batch of "uncommitted" changes, allowing her to easily review the agent's actions and make changes or undo as -necessary. While the site could also implement a chat interface to expose this functionality with their own agent, the -browser's agent provides a seamless journey by using tools across multiple sites/services. For example, pulling up -information from the user's email service._ - -**Agent**: Done! I've created three variations of the original design, each with a unique call to action. - -_Jen is now happy with these flyers. Normally she'd print to PDF and then take the file to a print shop. However, Easely -has a new print service that Jen doesn't know about and doesn't notice in the UI. However, the agent knows the page has -an `orderPrints` tool: - -```js -/** - * Orders the current design for printing and shiping to the user. - * - * copies - A number between 0 and 1000 indicating how many copies of the design to print. Required. - * page_size - The paper type to use. Available options are [Legal, Letter, A4, A5]. Default is "Letter". - * page_finish - What kind of paper finish to use. Available options are [Regular, Glosys Photo, Matte Photo]. - * Default is "Regular" - */ -orderPrints(copies, page_size, page_finish); -``` - -_The agent understands the user's intent and so surfaces a small chip in it's UI:_ - -**Agent**: `` - -_Jen is delighted she saved a trip to the store and clicks the button_. - -**Agent**: How many copies would you like? I'll request 8.5x11 sized regular paper but there are other options available. - -**Jen**: Please print 10 copies. - -**Agent**: Done! The order is ready for your review. - -_The site navigates to the checkout page where Jen can review the order and click the checkout button to confirm._ - - -### Example - Shopping - -Shopping often includes exploration, browsing, and reviewing. The agent can help and remove the tedious parts but the -user wants to be involved. - -_Maya has a few favorite clothing vendors but doesn't get formal clothes very often so she opens her AI assistant_ - -**Maya**: Suggest a few formal or semi-formal dress places where I can shop for my friend’s wedding. Please make sure -they are ecofriendly and not too expensive. - -**Agent**: Sure, here's a few suggestions - -1. Elara - uses innovative recycled and plant-based fabrics to minimize waste for a modern, chic wardrobe that allows - you to embrace the latest trends, responsibly. Price Range: Dresses typically \$90 - \$220 CAD - -2. Linden Harbour - creates wonderfully crafted womenswear designed to last a lifetime. Carries everything from classic -shirt dresses to elegant knitwear, meticulously made from the finest organic cotton, linen, and responsibly sourced -wool. Price Range: Dresses typically \$250 - \$550 CAD - -3. Wildebloom - Flowing dresses brought to life through artisanal collaborations, using natural plant dyes and panels of - upcycled vintage textiles. Price Range: Dresses typically \$180 - \$450 CAD - -**Maya**: Lets take a look at Wildebloom. - -_The agent now opens Maya's web browser, which it is integrated with. i.e. the agent can observe and control the -browser. It navigates to the dresses page on `http://wildebloom.example/shop`_ - -**Agent**: Ok, here are the dresses that Wildebloom carries. - -_Maya is immediately overwhelmed. There are so many options! Moreover, when she looks at filters she sees they're -quite limited with only colour and size as options._ - -**Maya**: Show me only dresses available in my size, and also show only the ones that would be appropriate for a -cocktail-attire wedding. - -_The agent notices the dresses page registers several tools:_ - -```js -/* - * Returns an array of product listings containing an id, detailed description, price, and photo of each - * product - * - * size - optional - a number between 2 and 14 to filter the results by EU dress size - * size - optional - a color from [Red, Blue, Green, Yellow, Black, White] to filter dresses by -getDresses(size, color) - -/* - * Displays the given products to the user - * - * product_ids - An array of numbers each of which is a product id returned from getDresses - */ -showDresses(product_ids) -``` - -_The agent calls `getDresses(6)` and receives a JSON object:_ - -```json -{ - products: [ - { - id: 1021, - description: "A short sleeve long dress with full length button placket...", - price: "€180", - image: "img_1024.png", - }, - { - id: 4320, - description: "A straight midi dress in organic cotton...", - price: "€140", - image: "img_4320.png", - }, - ... - ] -} -``` - -> [!Note] -> How to pass images and other non-textual data is something we should improve -> Issue #10 - -_The agent can now process this list, fetching each image, and using the user's criteria to filter the list. When -completed it makes another call, this time to `showDresses([4320, 8492, 5532, ...])`. This call updates the UI on the -page to show only the requested dresses._ - -_This is still too many dresses so Maya finds an old photo of herself in a summer dress that she really likes and shares -it with her agent._ - -**Maya**: Are there any dresses similar to the dress worn in this photo? Try to match the colour and style, but continue -to show me dresses appropriate for cocktail-attire. - -_The agent uses this image to identify several new parameters including: the colour, the fit, and the neckline and -narrows down the list to just a few dresses. Maya finds and clicks on a dress she likes._ - -_Notice, the user did not give their size, but the agent knows this from personalization and may even translate the stored -size into EU units to use it with this site._ - -### Example - Code Review - -Some services are very domain specific and/or provide a lot of functionality. A real world example is the Chromium code -review tool: Gerrit. See [CL#5142508](crrev.com/c/5142508). Gerrit has many features but they're not obvious just by -looking at the UI (you can press the '?' key to show a shortcut guide). In order to add a comment to a line, the user -must know to press the 'c' key. The user can suggest edits but has to open a comment to do so. Results from test runs -are available but are hidden in a generically-named "Checks" tab. - -Agents are typically trained on everyday usage so may do a poor job on more specialized, complex interfaces. However, -such sites could provide the agent with tools which serve as both a shortcut and a user manual for the agent. - -_John is a software developer and opens a code review sent from his colleague. He notices there's two red bots -indicating test failures on this patch._ - -**John**: Why are the Mac and Android bots failing? - -_The site includes some relevant tools:_ - -```js -/** - * Returns the status of each bot run in a try run job. Use the returned bot_name in getTryRunFailureSnippet - * to get more details about a run. - */ -getTryRunStatuses(); - -/** - * If a bot failed, returns the TAIL snippet of the log containing the error message describing the failure. - * - * bot_name - The name of the bot, as returned from getTryRunStatuses, to get failure snippet from - */ -getTryRunFaliureSnippet(bot_name) -``` - -_The agent calls `getTryRunStatuses()` to find the names of the failing bots. It returns:_ - -```json -[ - { - "bot_name": "mac-x86-rel", - "status": "PASSED" - }, - { - "bot_name": "mac-x64-rel", - "status": "FAIL" - }, - { - "bot_name": "android-15-rel", - "status": "PASSED" - }, - ... -] -``` - -**Agent**: I see that the Mac x86 Android 10 bots are failing. Let me get more details... - -The agent now calls `getTryRunFailureSnippet` on each of the failing bots and receives a snippet of the failing log of -each._ - -**Agent**: - -The Mac bot is failing to start with an "Out of Space" error. This is probaby just an ifrastructure issue. - -The Android bot is failing while linking with a missing symbol `gfx::DisplayCompositor`. - -_John realizes his colleague forgot to add the Android implementation of this class to the build file._ - -**John**: I think BUILD.gn is missing `display_compositor_android.cc`, please add a suggested edit adding it in the -Android section. - -_The agent has access to BUILD.gn from the main UI as well as in DOM. It notices it also has access to a relevant tool -which is uses to add the change:_ - -```js -/* - * Adds a suggested edit to the review - * - * filename - the name of the file in which to make the edit - * patch - the edit diff in unidiff format - */ -addSuggestedEdit(filename, patch) -``` - -**Agent**: Ok, I've added `display_compositor_android.cc` to the BUILD.gn file. - -_The UI displays the suggested diff with an option for the user to accept, modify, or reject the change. John accepts -the change._ - -_Reading the rest of the review, John notices a small issue repeated across multiple files._ - -**John**: Add a polite comment to the review that we should use "PointF" rather than "Point" for input coordinates since -the latter can cause unintended rounding. Then add suggested edits changing all instances where Point was added to -PointF. - -_The agent automates the repetitive task of making all the simple changes. The UI provides John with a visual way to -quickly review the agent's actions and accept/modify/reject them._ - -## Assumptions - -* For many sites wanting to integrate with agents quickly - augmenting their existing UI with script tools will be - easier vs. backend integration -* Agents will perform quicker and more successfully with specific tools compared to using a human interface. -* Users might use an agent for a direct action query (e.g. “create a 30 minute meeting with Pat at 3:00pm”), complex - cross-site queries (e.g. “Find the 5 highest rated restaurants in Toronto, pin them in my Map, and book a table at - each one over the next 5 weeks”) and everything in between. - -## Prior Art - -### Model Context Protocol (MCP) - -MCP is a protocol for applications to interface with an AI model. Developed by Anthropic, MCP is supported by Claude -Desktop and Open AI's Agents SDK as well as a growing ecosystem of clients and servers. - -In MCP, an application can expose tools, resources, and more to an AI-enabled application by implementing an MCP server. -The server can be implemented in various languages, as long as it conforms to the protocol. For example, here’s an -implementation of a tool using the Python SDK from the MCP quickstart guide: - -```python -@mcp.tool() -async def get_alerts(state: str) -> str: - """Get weather alerts for a US state. - - Args: - state: Two-letter US state code (e.g. CA, NY) - """ - url = f"{NWS_API_BASE}/alerts/active/area/{state}" - data = await make_nws_request(url) - - if not data or "features" not in data: - return "Unable to fetch alerts or no alerts found." - - if not data["features"]: - return "No active alerts for this state." - - alerts = [format_alert(feature) for feature in data["features"]] - return "\n---\n".join(alerts) -``` - -A client application implements a matching MCP client which takes a user’s query, communicates with one or more MCP -servers to enumerate their capabilities, and constructs a prompt to the AI platform, passing along any server-provided -tools or data. - -The MCP protocol defines how this client-server communication happens. For example, a client can ask the server to list -all tools which might return a response like this: - -```json -{ - "jsonrpc": "2.0", - "id": 1, - "result": { - "tools": [ - { - "name": "get_weather", - "description": "Get current weather information for a location", - "inputSchema": { - "type": "object", - "properties": { - "location": { - "type": "string", - "description": "City name or zip code" - } - }, - "required": ["location"] - } - } - ], - "nextCursor": "next-page-cursor" - } -} -``` - -Unlike OpenAPI, MCP is transport-agnostic. It comes with two built in transports: stdio which uses the systems standard -input/output, well suited for local communication between apps, and Server-Sent Events (SSE) which uses HTTP commands -for remote execution. - -### WebMCP (MCP-B) - -[MCP-B](https://mcp-b.ai/), or Model Context Protocol for the Browser, is an open source project found on GitHub [here](https://github.com/MiguelsPizza/WebMCP) and has much the same motivation and solution as described in this proposal. MCP-B's underlying protocol, also named WebMCP, extends MCP with tab transports that allow in-page communicate between a website's MCP server and any client in the same tab. It also extends MCP with extension transports that use Chromium's runtime messaging to make a website's MCP server available to other extension components within the browser (background, sidebar, popup), and to other external MCP clients running on the same machine. MCP-B enables tools from different sites to work together, and for sites to cache tools so that they are discoverable even if the browser isn't currently navigated to the site. - -### OpenAPI - -OpenAPI is a standard for describing HTTP based APIs. Here’s an example in YAML (from the ChatGPT Actions guide): - -```yaml -openapi: 3.1.0 -info: - title: NWS Weather API - description: Access to weather data including forecasts, alerts, and observations. - version: 1.0.0 -servers: - - url: https://api.weather.gov - description: Main API Server -paths: - /points/{latitude},{longitude}: - get: - operationId: getPointData - summary: Get forecast grid endpoints for a specific location - parameters: - - name: latitude - in: path - required: true - schema: - type: number - format: float - description: Latitude of the point - - name: longitude - in: path - required: true - schema: - type: number - format: float - description: Longitude of the point - responses: - '200': - description: Successfully retrieved grid endpoints - content: - application/json: - schema: - type: object - properties: - properties: - type: object - properties: - forecast: - type: string - format: uri - forecastHourly: - type: string - format: uri - forecastGridData: - type: string - format: uri -``` - -A subset of the OpenAPI specification is used for function-calling / tool use for various AI platforms, such as ChatGPT -Actions and Gemini Function Calling. A user or developer on the AI platform would provide the platform with the OpenAPI -schema for an API they wish to provide as a “tool”. The AI is trained to understand this schema and is able to select -the tool and output a “call” to it, providing the correct arguments. Typically, some code external to the AI itself -would be responsible for making the API call and passing the returned result back to the AI’s conversation context to -reply to the user’s query. - -### Agent2Agent Protocol - -The Agent2Agent Protocol is another protocol for communication between agents. While similar in structure to MCP (client -/ server concepts that communicate via JSON-RPC), A2A attempts to solve a different problem. MCP (and OpenAPI) are -generally about exposing traditional capabilities to AI models (i.e. “tools”), A2A is a protocol for connecting AI -agents to each other. It provides some additional features to make common tasks in this scenario more streamlined, such -as: capability advertisement, long running and multi-turn interactions, and multimodal input/output. - -## Open topics - -### Security considerations - -There are security considerations that will need to be accounted for, especially if the WebMCP API is used by semi-autonomous systems like LLM-based agents. Engagement from the community is welcome. - -### Model poisoning - -Explorations should be made on the potential implications of allowing web developers to create tools in their front-end code for use in AI agents and LLMs. For example, vulnerabilities like being able to access content the user would not typically be able to see will need to be investigated. - -### Cross-Origin Isolation - -Client applications would have access to many different web sites that expose tools. Consider an LLM-based agent. It is possible and even likely that data output from one application's tools could find its way into the input parameters for a second application's tool. There are legitimate reasons for the user to want to send data across origins to achieve complex tasks. Care should be taken to indicate to the user which web applications are being invoked and with what data so that the user can intervene. - -### Permissions - -A trust boundary is crossed both when a web site first registers tools via WebMCP, and when a new client agent wants to use these tools. When a web site registers tools, it exposes information about itself and the services it provides to the host environment (i.e. the browser). When agents send tool calls, the site receives untrusted input in the parameters and the outputs in turn may contain sensitive user information. The browser should prompt the user at both points to grant permission and also provide a means to see what information is being sent to and from the site when a tool is called. To streamline workflows, browsers may give users the choice to always allow tool calls for a specific web app and client app pair. - -### Model Context Protocol (MCP) without WebMCP - -MCP has quickly garnered wide interest from the developer community, with hundreds of MCP servers being created. WebMCP is designed to work well with MCP, so that developers can reuse many of the MCP topics with their front-end website using JavaScript. We originally planned to propose an explainer very tightly aligned with MCP, providing all the same concepts supported by MCP at the time of writing, including tools, resources, and prompts. Since MCP is still actively changing, matching its exact capabilities would be an ongoing effort. Aligning the WebMCP API tightly with MCP would also make it more difficult to tailor WebMCP for non-LLM scenarios like OS and accessibility assistant integrations. Keeping the WebMCP API as agnostic as possible increases the chance of it being useful to a broader range of potential clients. - -We expect some web developers will continue to prefer standalone MCP instead of WebMCP if they want to have an always-on MCP server running that does not require page navigation in a full browser process. For example, server-to-server scenarios such as fully autonomous agents will likely benefit more from MCP servers. WebMCP is best suited for local browser workflows with a human in the loop. - -The WebMCP API still maps nicely to MCP, and exposing WebMCP tools to external applications via an MCP server is still a useful scenario that a browser implementation may wish to enable. - -### Existing web automation techniques (DOM, accessibility tree) - -One of the scenarios we want to enable is making the web more accessible to general-purpose AI-based agents. In the absence of alternatives like MCP servers to accomplish their goals, these general-purpose agents often rely on observing the browser state through a combination of screenshots, and DOM and accessibility tree snapshots, and then interact with the page by simulating human user input. We believe that WebMCP will give these tools an alternative means to interact with the web that give the web developer more control over whether and how an AI-based agent interacts with their site. - -The proposed API will not conflict with these existing automation techniques. If an agent or assistive tool finds that the task it is trying to accomplish is not achievable through the WebMCP tools that the page provides, then it can fall back to general-purpose browser automation to try and accomplish its task. - -## Future explorations - -### Progressive web apps (PWA) - -PWAs should also be able to use the WebMCP API as described in this proposal. There are potential advantages to installing a site as a PWA. In the current proposal, tools are only discoverable once a page has been navigated to and only persist for the lifetime of the page. A PWA with an app manifest could declare tools that are available "offline", that is, even when the PWA is not currently running. The host system would then be able to launch the PWA and navigate to the appropriate page when a tool call is requested. - -### Background model context providers - -Some tools that a web app may want to provide for agents and assistive technologies may not require any web UI. For example, a web developer building a "To Do" application may want to expose a tool that adds an item to the user's todo list without showing a browser window. The web developer may be content to just show a notification that the todo item was added. - -For scenarios like this, it may be helpful to combine tool call handling with something like the ['launch'](https://github.com/WICG/web-app-launch/blob/main/sw_launch_event.md) event. A client application might attach a tool call to a "launch" request which is handled entirely in a service worker without spawning a browser window. \ No newline at end of file From 1e018e7028ce8badaad2c4fdb285bcca4502746c Mon Sep 17 00:00:00 2001 From: Brandon Walderman Date: Fri, 8 Aug 2025 11:42:04 -0700 Subject: [PATCH 3/4] Update docs/proposal.md Co-authored-by: Anssi Kostiainen --- docs/proposal.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/proposal.md b/docs/proposal.md index 0998b12..7764658 100644 --- a/docs/proposal.md +++ b/docs/proposal.md @@ -255,4 +255,8 @@ Handling tool calls on the main thread raises performance concerns, especially i One proposal was to expose the WebMCP API only in service workers and let the service worker post messages to individual client windows/tabs as needed in order to update UI. This would have complicated the architecture and required web developers to add a service worker. This would also have required the Session concept described earlier to help the service worker differentiate between agents that are connected to different windows and dispatch requests from a particular agent to the correct window. -For long-running, batched, or expensive tool calls, we expect web developers will dynamically update their UI when these are taking place to temporarily cede control to the agent (e.g. disable or remove human form inputs, indicate via UI that an agent is in control), and take advantage of dedicated workers as needed to offload expensive operations. This can be achieved with existing dedicated or shared workers. \ No newline at end of file +For long-running, batched, or expensive tool calls, we expect web developers will dynamically update their UI when these are taking place to temporarily cede control to the agent (e.g. disable or remove human form inputs, indicate via UI that an agent is in control), and take advantage of dedicated workers as needed to offload expensive operations. This can be achieved with existing dedicated or shared workers. + +## Acknowledgments + +Many thanks to [Alex Nahas](https://github.com/MiguelsPizza) for sharing related [implementation experience](https://github.com/MiguelsPizza/WebMCP). \ No newline at end of file From 446069b6b666a60affb6cf1dfa498d3f2191cdce Mon Sep 17 00:00:00 2001 From: Brandon Walderman Date: Wed, 13 Aug 2025 14:15:30 -0700 Subject: [PATCH 4/4] Address feedback. --- docs/explainer.md | 14 +++++++------- docs/proposal.md | 12 ++++++------ 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/explainer.md b/docs/explainer.md index d33d75d..e73f495 100644 --- a/docs/explainer.md +++ b/docs/explainer.md @@ -65,12 +65,12 @@ There are several advantages to using the web to connect agents to services: WebMCP is a proposal for a web API that enables web pages to provide agent-specific paths in their UI. With WebMCP, agent-service interaction takes place _via app-controlled UI_, providing a shared context available to app, agent, and user. In contrast to backend integrations, WebMCP tools are available to an agent only once it has loaded a page and they execute on the client. Page content and actuation remain available to the agent (and the user) but the agent also has access to tools which it can use to achieve its goal more directly. -![A diagram showing an agent communicating with a third-party service via script tools running in a live web page](../content/explainer_st.svg) +![A diagram showing an agent communicating with a third-party service via WebMCP running in a live web page](../content/explainer_st.svg) In contrast, in a backend integration, the agent-service interaction takes place directly, without an associated UI. If a UI is required it must be provided by the agent itself or somehow connected to an existing UI manually: -The expected flow using browser agents and Script Tools: +The expected flow using browser agents and WebMCP: ![A diagram showing an agent communicating with a third-party service directl via MCP](../content/explainer_mcp.svg) @@ -91,7 +91,7 @@ The expected flow using browser agents and Script Tools: ## Use Cases -The use cases for script tools are ones in which the user is collaborating with the agent, rather than completely +The use cases for WebMCP are ones in which the user is collaborating with the agent, rather than completely delegating their goal to it. They can also be helpful where interfaces are highly specific or complicated. ### Example - Creative @@ -105,7 +105,7 @@ which to choose from so she asks her browser agent for help._ **Jen**: Show me templates that are spring themed and that prominently feature the date and time. They should be on a white background so I don't have to print in color. -_The current document has registered a script tool that the agent notices may be relevant to this query:_ +_The current document has registered a WebMCP tool that the agent notices may be relevant to this query:_ ```js /** @@ -180,7 +180,7 @@ _The agent takes this action using a sequence of tool calls which might look som * `AddPage("DUPLICATE")` * `EditDesign("Change the call-to-action text to 'Come for the bargains, stay for the cookies'")` -_Jen now has 3 versions of the same yard sale flyer. Easely implements these script tools using AI-based techinques on +_Jen now has 3 versions of the same yard sale flyer. Easely implements these WebMCP tools using AI-based techinques on their backend to allow a natural language interface. Additionally, the UI presents these changes to Jen as an easily reversible batch of "uncommitted" changes, allowing her to easily review the agent's actions and make changes or undo as necessary. While the site could also implement a chat interface to expose this functionality with their own agent, the @@ -414,7 +414,7 @@ quickly review the agent's actions and accept/modify/reject them._ ## Assumptions -* For many sites wanting to integrate with agents quickly - augmenting their existing UI with script tools will be +* For many sites wanting to integrate with agents quickly - augmenting their existing UI with WebMCP tools will be easier vs. backend integration * Agents will perform quicker and more successfully with specific tools compared to using a human interface. * Users might use an agent for a direct action query (e.g. “create a 30 minute meeting with Pat at 3:00pm”), complex @@ -492,7 +492,7 @@ for remote execution. ### WebMCP (MCP-B) -[MCP-B](https://mcp-b.ai/), or Model Context Protocol for the Browser, is an open source project found on GitHub [here](https://github.com/MiguelsPizza/WebMCP) and has much the same motivation and solution as described in this proposal. MCP-B's underlying protocol, also named WebMCP, extends MCP with tab transports that allow in-page communicate between a website's MCP server and any client in the same tab. It also extends MCP with extension transports that use Chromium's runtime messaging to make a website's MCP server available to other extension components within the browser (background, sidebar, popup), and to other external MCP clients running on the same machine. MCP-B enables tools from different sites to work together, and for sites to cache tools so that they are discoverable even if the browser isn't currently navigated to the site. +[MCP-B](https://mcp-b.ai/), or Model Context Protocol for the Browser, is an open source project found on GitHub [here](https://github.com/MiguelsPizza/WebMCP) and has much the same motivation and solution as described in this proposal. MCP-B's underlying protocol, also named WebMCP, extends MCP with tab transports that allow in-page communication between a website's MCP server and any client in the same tab. It also extends MCP with extension transports that use Chromium's runtime messaging to make a website's MCP server available to other extension components within the browser (background, sidebar, popup), and to other external MCP clients running on the same machine. MCP-B enables tools from different sites to work together, and for sites to cache tools so that they are discoverable even if the browser isn't currently navigated to the site. ### OpenAPI diff --git a/docs/proposal.md b/docs/proposal.md index 7764658..eac78b3 100644 --- a/docs/proposal.md +++ b/docs/proposal.md @@ -14,11 +14,11 @@ Only a top-level browsing context, such as a browser tab can be a model context * A natural language description of the parameter * The expected type (e.g. Number, String, Enum, etc) * Any restrictions on the parameter (e.g. integers greater than 0) -* A JS callback function that implementings the tool and returns a result +* A JS callback function that implements the tool and returns a result -When an agent that is connected to the page sends a tool call, the JavaScript callback is invoked, where the page can handle the tool call and respond to the agent. Simple applications can handle tool calls entirely in page script, but more complex applications may choose to delegate computationally heavy operations to workers and respond to the agent asynchronously. +When an agent that is connected to the page sends a tool call, the JavaScript callback is invoked, where the page can handle the tool call and respond to the agent. The function can be asynchronous and return a promise, in which case the agent will receive the result once the promise is resolved. Simple applications can handle tool calls entirely in page script, but more complex applications may choose to delegate computationally heavy operations to workers and respond to the agent asynchronously. -Handling tool cools in the main thread with the option of delegating to workers serves a few purposes: +Handling tool calls in the main thread with the option of delegating to workers serves a few purposes: - Ensures tool calls run one at a time and sequentially. - The page can update UI to reflect state changes performed by tools. @@ -68,7 +68,7 @@ window.agent.provideContext({ }); ``` -The `provideContext` method can be called multiple times. Subsequent calls clear any pre-existing tools and other context before registering the new ones. This is useful for single-page web apps that frequently change UI state and could benefit from presenting different tools depending on which state the UI is currently in. +The `provideContext` method can be called multiple times. Subsequent calls clear any pre-existing tools and other context before registering the new ones. This is useful for single-page web apps that frequently change UI state and could benefit from presenting different tools depending on which state the UI is currently in. For a list of tools passed to `provideContext`, each tool name in the list is expected to be unique. **Advantages:** @@ -132,7 +132,7 @@ Tool calls are handled as events. Since event handler functions can't respond to ### Recommendation -A **hybrid** approach of both of the examples above is recommended as this would make it easy for web developers to get started adding tools to their page, while leaving open the possibility of manifest-based approaches in the future. To implement this hybrid approach, a `"toolcall"` event is dispatched on every incoming tool call _before_ executing the tool's `execute` function. The event handler can handle the tool call by calling the event's `preventDefault()` method, and then responding to the agent with `respondWith()` as shown above. If the event handle does not call `preventDefault()` then the browser's default behavior for tool calls will occur. The `execute` function for the requested tool is called. If a tool with the requested name does not exist, then the browser responds to the agent with an error. +A **hybrid** approach of both of the examples above is recommended as this would make it easy for web developers to get started adding tools to their page, while leaving open the possibility of manifest-based approaches in the future. To implement this hybrid approach, a `"toolcall"` event is dispatched on every incoming tool call _before_ executing the tool's `execute` function. The event handler can handle the tool call by calling the event's `preventDefault()` method, and then responding to the agent with `respondWith()` as shown above. If the event handler does not call `preventDefault()` then the browser's default behavior for tool calls will occur. The `execute` function for the requested tool is called. If a tool with the requested name does not exist, then the browser responds to the agent with an error. ## Example of WebMCP API usage @@ -259,4 +259,4 @@ For long-running, batched, or expensive tool calls, we expect web developers wil ## Acknowledgments -Many thanks to [Alex Nahas](https://github.com/MiguelsPizza) for sharing related [implementation experience](https://github.com/MiguelsPizza/WebMCP). \ No newline at end of file +Many thanks to [Alex Nahas](https://github.com/MiguelsPizza) and [Jason McGhee](https://github.com/jasonjmcghee/) for sharing related [implementation](https://github.com/MiguelsPizza/WebMCP) [experience](https://github.com/jasonjmcghee/WebMCP). \ No newline at end of file