Add new explainer for service workers #19

bwalderman · 2025-08-28T22:47:17Z

Adding a supplemental explainer document for WebMCP within service workers. This would enable web developers to handle tool calls in service workers that can be activated on-demand, even if the web site isn't currently opened. This makes it possible to implement tools that work "in the background", but with the option to open windows for user input if needed.

docs/service-workers.md

anssiko

@bwalderman, this WebMCP for Service Workers explainer is a well-written document, thank you! I'd advise the group to focus on security considerations.

@MiguelsPizza has experimented with tool use across multiple origins in his implementation, see Cross-Site Tool Composition. Any learnings from this experiment? MCP-B Chrome Extension implements a tool selection UI that is very functional but may not work for a mainstream product as is. While web specs don't prescribe UI/UX, it is helpful to have these concrete early implementations available to test feasibility of various ideas.

Co-authored-by: Anssi Kostiainen <anssi.kostiainen@gmail.com>

MiguelsPizza · 2025-09-04T20:31:40Z

Thanks for the mention @anssiko!

I'm planning to contribute a lot more to this group going forward. I've actually left my job to start a company around this idea, which is why I haven't been active here. The open source extension is much more of a proof of concept and I plan to transition it to being a tool for testing and writing webMCP servers rather than a serious client. I'll come share all my learnings next week post launch. I also plan to repurpose my old prompt-api extension as an webMCP polyfill and client and contribute it to the working group if you all are interested.

Here is a video of the service worker approach the MCP-B extension takes:

Built-in.Retina.Display.mp4

For context, the MCP-B extension aggregates WebMCP servers from all open tabs through the mcpHub in the extension's service worker. It's basically acting as an MCP host like Claude Desktop, but for all the open tabs which have webMCP servers. It exposes the aggregate tools over a custom chrome ports transport which other MCP clients within the extension can connect to.

In terms of cross-origin, when executing tools, the extension service worker navigates to or focuses on the tab that holds the tool, then routes the call through the content script to the page's MCP server. I don't really do remote/background MCP execution. If we're going to be calling remote tools, we might as well just use a remote MCP server. I think the benefit of WebMCP servers a lot of times is that the human is in the loop and you can see tools executing. There's no guesswork about what's happening, and the human can stop anytime to prevent the agent from doing an action it shouldn't. Not saying remotely executing WebMCP tools doesn't have its place though.

Giving the agent access to the extension tab APIs solves many of the problems with discoverability. The agent generally has enough world knowledge to know that mail.google.com is the place to go to find and interact with email webMCP tools, etc.
On the security side, we've implemented an explicit consent model. When you visit a website with tools for the first time, you have to explicitly approve both the domain and the specific tools before they can be used. We are experimenting with hash-based verification where if any part of the tool changes (params, schema, description) it has to be re-approved. There's still the risk that a website owner could change what a tool actually does under the hood while keeping the signature the same, but at some point you're trusting the website just like you trust any web app you use. The fact that WebMCP tools can be disabled before starting an agentic workflow puts WebMCP miles above existing browser automation tools in terms of security, though there are still risks inherent to the agents themselves.

For data leakage between origins, I've been thinking about a clipboard-like approach where website tools can request to write to a client-managed clipboard without the data ever entering the agent's context. So if you had sensitive info like a social security number on one site, the agent could store it in this clipboard and then when it needs to use it elsewhere, it would need explicit user permission to paste it, but the agent never actually sees the data. This could help maintain separation between origins while still enabling workflows.

The multi-origin access is definitely powerful but I can see why you're considering single-origin limitation as a starting point. Your service worker approach would actually solve the tab-switching friction nicely for single-origin workflows where you do want that background execution without the visual context switching, and this makes sense for WebMCP clients which live in a webapp rather than a browser extension.

khushalsagar

Sorry for the late reply! Been bogged down with other work and finally catching up on WebMCP. :)

khushalsagar · 2025-09-05T00:35:50Z

docs/service-workers.md

+
+The initial explainer for [WebMCP](https://github.com/webmachinelearning/webmcp) covers how web pages expose context to AI agents that are already browsing a page. WebMCP's design makes it possible for in-browser AI integrations such as sidebar assistants to access context on currently opened browser tabs, and to operate the pages within those tabs.
+
+Sometimes, an agent may require context and tools from a site that the user doesn't currently have open. Perhaps the user is currently viewing a potential campsite on their favorite maps app and would like the agent to attempt to make a reservation while they continue to explore the map for potential nearby hikes. Ideally, the agent should be able to send a reservation request without navigating the tab or opening a new tab, either of which can be distracting to the user. What's needed is a way for the agent to discover an app that can handle the reservation request and a way to send that request in the background without necessarily showing any UI.


Ideally, the agent should be able to send a reservation request without navigating the tab or opening a new tab, either of which can be distracting to the user.

I'm not seeing why loading a tab in the background (if we don't want to take the user out of the current context) is not enough. The core sell of WebMCP is journeys where the user needs to interact with the service frequently enough that there's benefit to having the site's UI on the client. If something can be accomplished by the Agent largely interacting with the site in the background then wouldn't remote MCP be better? I'm echoing @MiguelsPizza's take here: "If we're going to be calling remote tools, we might as well just use a remote MCP server. I think the benefit of WebMCP servers a lot of times is that the human is in the loop and you can see tools executing."

What's needed is a way for the agent to discover an app that can handle the reservation request

The discovery problem is orthogonal to whether tool execution requires a tab or is backgrounded. My thoughts there align with @bokand's feedback on #8.

khushalsagar · 2025-09-05T00:49:02Z

docs/service-workers.md

+
+WebMCP is intended to align closely with MCP architecture so that developers that have experience with one can easily apply their skills to the other. In MCP, tools are not individual HTTP endpoints. Handling WebMCP tools with dedicated callback functions in script instead of as HTTP fetch events aligns with how tools are treated in backend MCP frameworks.
+
+## Security Considerations


These considerations are the same irrespective of foreground tab interaction or service worker based execution, right?

Yes, these considerations apply to both.

khushalsagar · 2025-09-05T00:51:41Z

docs/service-workers.md

+
+Before an agent can interact with the web through tool calling, it needs to discover a web site to interact with. In the case of the WebMCP API as currently designed, this problem is implicitly solved because the user must first navigate to the page. For an agent to access the tools from a web site that the user isn't currently browsing, it needs some means of discovering that site and getting information about the tools and context that site provides.
+
+Discovery is a complex topic which goes beyond the scope of this explainer but merits some discussion here for the purpose of understanding the feature end-to-end. Fundamentally, tool discovery requires some way for agents to obtain the URL of a site that has tools relevant to the user's tasks, and a way to install that site's service worker for handling tool requests. There are many potential mechanisms for this, including but not limited to:


Fundamentally, tool discovery requires some way for agents to obtain the URL of a site that has tools relevant to the user's tasks

Are you considering UI actuation in this design or is the Agent limited to sites which support WebMCP?

This design is mainly talking about discovery for sites that support service worker WebMCP.

Discovering a website that can help the user with X tasks through actuation or on-page WebMCP tools is accomplished with a web search and a navigation.

For service-worker based tools though, we want to support the caser where the user never actually navigates to the web site (they agent discovers and registers the service worker directly). So, this design discusses what infrastructure an agent would need to find and activate that service worker.

I think we need to reconcile the broader question of how we see these tools as means for discovering capabilities of sites. #8 is focused on that. While these tools can be one way of understanding what a site can accomplish, indexing using content built for human consumption in the site will be the better source of truth of a site's capabilities for a while.

That said, I still don't see how the discovery aspect is different between service worker vs on-page WebMCP tools. The Agent is making 2 decisions when the user makes a query:

Which site/sites should I load for this query.

Should I interact with this site in the foreground or background.

Discovery will happen first for 1). Once the site has been decided, whether it's loaded in a tab or there's just-in-time fetching of service worker (as is proposed here) will be a separate decision from discovery.

You could say that first an Agent decides whether the task should be executed in the background. If yes, then it biases towards sites which support WebMCP in a service worker. But I don't see a world where that's optimal for the user. The Agent should be selecting the site based on it's capabilities (and using the best integration option available), not based on the type of integration it supports.

bwalderman · 2025-09-05T17:29:18Z

@MiguelsPizza

If we're going to be calling remote tools, we might as well just use a remote MCP server.

That's always an option, but one advantage of using WebMCP for background tasks is that if the user is already logged in to the site/app then the WebMCP service worker will have those credentials. Otherwise, the service worker can open a tab for the user to authenticate.

As I understand it, auth with remote MCP servers is still quite rough, and requires creating and managing auth tokens.

MiguelsPizza · 2025-09-05T17:59:59Z

@bwalderman That's a fair point, it would be beneficial for readonly tasks in particular.

What are your thoughts on supporting elicitation in the webMCP spec. It would be a good way to focus on the remote tab in the case user input is needed to unblock the model.

bwalderman · 2025-09-05T19:06:08Z

I'm going to open a separate issue to discuss elicitation to avoid this PR thread going too off topic :)

Add explainer for service workers

2ea03e8

bwalderman requested review from anssiko, bokand, hvanops, khushalsagar, leotlee, sohchatt and sushraja-msft August 28, 2025 22:57

bwalderman mentioned this pull request Sep 2, 2025

API design #15

Open

anssiko reviewed Sep 4, 2025

View reviewed changes

docs/service-workers.md Outdated Show resolved Hide resolved

anssiko approved these changes Sep 4, 2025

View reviewed changes

Update docs/service-workers.md

437ab21

Co-authored-by: Anssi Kostiainen <anssi.kostiainen@gmail.com>

bwalderman merged commit 019c901 into main Sep 4, 2025
1 check passed

khushalsagar reviewed Sep 5, 2025

View reviewed changes

bwalderman mentioned this pull request Sep 5, 2025

Prompt injection #11

Open


		The initial explainer for [WebMCP](https://github.com/webmachinelearning/webmcp) covers how web pages expose context to AI agents that are already browsing a page. WebMCP's design makes it possible for in-browser AI integrations such as sidebar assistants to access context on currently opened browser tabs, and to operate the pages within those tabs.

		Sometimes, an agent may require context and tools from a site that the user doesn't currently have open. Perhaps the user is currently viewing a potential campsite on their favorite maps app and would like the agent to attempt to make a reservation while they continue to explore the map for potential nearby hikes. Ideally, the agent should be able to send a reservation request without navigating the tab or opening a new tab, either of which can be distracting to the user. What's needed is a way for the agent to discover an app that can handle the reservation request and a way to send that request in the background without necessarily showing any UI.


		WebMCP is intended to align closely with MCP architecture so that developers that have experience with one can easily apply their skills to the other. In MCP, tools are not individual HTTP endpoints. Handling WebMCP tools with dedicated callback functions in script instead of as HTTP fetch events aligns with how tools are treated in backend MCP frameworks.

		## Security Considerations


		Before an agent can interact with the web through tool calling, it needs to discover a web site to interact with. In the case of the WebMCP API as currently designed, this problem is implicitly solved because the user must first navigate to the page. For an agent to access the tools from a web site that the user isn't currently browsing, it needs some means of discovering that site and getting information about the tools and context that site provides.

		Discovery is a complex topic which goes beyond the scope of this explainer but merits some discussion here for the purpose of understanding the feature end-to-end. Fundamentally, tool discovery requires some way for agents to obtain the URL of a site that has tools relevant to the user's tasks, and a way to install that site's service worker for handling tool requests. There are many potential mechanisms for this, including but not limited to:

Add new explainer for service workers #19

Add new explainer for service workers #19

Uh oh!

Conversation

bwalderman commented Aug 28, 2025

Uh oh!

Uh oh!

anssiko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MiguelsPizza commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

khushalsagar left a comment

Choose a reason for hiding this comment

Uh oh!

khushalsagar Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

khushalsagar Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

bwalderman Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

khushalsagar Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

bwalderman Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

khushalsagar Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

bwalderman commented Sep 5, 2025

Uh oh!

MiguelsPizza commented Sep 5, 2025

Uh oh!

bwalderman commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MiguelsPizza commented Sep 4, 2025 •

edited

Loading