-
Notifications
You must be signed in to change notification settings - Fork 9
Add new explainer for service workers #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwalderman, this WebMCP for Service Workers explainer is a well-written document, thank you! I'd advise the group to focus on security considerations.
@MiguelsPizza has experimented with tool use across multiple origins in his implementation, see Cross-Site Tool Composition. Any learnings from this experiment? MCP-B Chrome Extension implements a tool selection UI that is very functional but may not work for a mainstream product as is. While web specs don't prescribe UI/UX, it is helpful to have these concrete early implementations available to test feasibility of various ideas.
Co-authored-by: Anssi Kostiainen <anssi.kostiainen@gmail.com>
Thanks for the mention @anssiko! I'm planning to contribute a lot more to this group going forward. I've actually left my job to start a company around this idea, which is why I haven't been active here. The open source extension is much more of a proof of concept and I plan to transition it to being a tool for testing and writing webMCP servers rather than a serious client. I'll come share all my learnings next week post launch. I also plan to repurpose my old prompt-api extension as an webMCP polyfill and client and contribute it to the working group if you all are interested. Here is a video of the service worker approach the MCP-B extension takes: Built-in.Retina.Display.mp4For context, the MCP-B extension aggregates WebMCP servers from all open tabs through the mcpHub in the extension's service worker. It's basically acting as an MCP host like Claude Desktop, but for all the open tabs which have webMCP servers. It exposes the aggregate tools over a custom chrome ports transport which other MCP clients within the extension can connect to. In terms of cross-origin, when executing tools, the extension service worker navigates to or focuses on the tab that holds the tool, then routes the call through the content script to the page's MCP server. I don't really do remote/background MCP execution. If we're going to be calling remote tools, we might as well just use a remote MCP server. I think the benefit of WebMCP servers a lot of times is that the human is in the loop and you can see tools executing. There's no guesswork about what's happening, and the human can stop anytime to prevent the agent from doing an action it shouldn't. Not saying remotely executing WebMCP tools doesn't have its place though. Giving the agent access to the extension tab APIs solves many of the problems with discoverability. The agent generally has enough world knowledge to know that mail.google.com is the place to go to find and interact with email webMCP tools, etc. For data leakage between origins, I've been thinking about a clipboard-like approach where website tools can request to write to a client-managed clipboard without the data ever entering the agent's context. So if you had sensitive info like a social security number on one site, the agent could store it in this clipboard and then when it needs to use it elsewhere, it would need explicit user permission to paste it, but the agent never actually sees the data. This could help maintain separation between origins while still enabling workflows. The multi-origin access is definitely powerful but I can see why you're considering single-origin limitation as a starting point. Your service worker approach would actually solve the tab-switching friction nicely for single-origin workflows where you do want that background execution without the visual context switching, and this makes sense for WebMCP clients which live in a webapp rather than a browser extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply! Been bogged down with other work and finally catching up on WebMCP. :)
|
||
The initial explainer for [WebMCP](https://github.com/webmachinelearning/webmcp) covers how web pages expose context to AI agents that are already browsing a page. WebMCP's design makes it possible for in-browser AI integrations such as sidebar assistants to access context on currently opened browser tabs, and to operate the pages within those tabs. | ||
|
||
Sometimes, an agent may require context and tools from a site that the user doesn't currently have open. Perhaps the user is currently viewing a potential campsite on their favorite maps app and would like the agent to attempt to make a reservation while they continue to explore the map for potential nearby hikes. Ideally, the agent should be able to send a reservation request without navigating the tab or opening a new tab, either of which can be distracting to the user. What's needed is a way for the agent to discover an app that can handle the reservation request and a way to send that request in the background without necessarily showing any UI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, the agent should be able to send a reservation request without navigating the tab or opening a new tab, either of which can be distracting to the user.
I'm not seeing why loading a tab in the background (if we don't want to take the user out of the current context) is not enough. The core sell of WebMCP is journeys where the user needs to interact with the service frequently enough that there's benefit to having the site's UI on the client. If something can be accomplished by the Agent largely interacting with the site in the background then wouldn't remote MCP be better? I'm echoing @MiguelsPizza's take here: "If we're going to be calling remote tools, we might as well just use a remote MCP server. I think the benefit of WebMCP servers a lot of times is that the human is in the loop and you can see tools executing."
What's needed is a way for the agent to discover an app that can handle the reservation request
The discovery problem is orthogonal to whether tool execution requires a tab or is backgrounded. My thoughts there align with @bokand's feedback on #8.
|
||
WebMCP is intended to align closely with MCP architecture so that developers that have experience with one can easily apply their skills to the other. In MCP, tools are not individual HTTP endpoints. Handling WebMCP tools with dedicated callback functions in script instead of as HTTP fetch events aligns with how tools are treated in backend MCP frameworks. | ||
|
||
## Security Considerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These considerations are the same irrespective of foreground tab interaction or service worker based execution, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these considerations apply to both.
|
||
Before an agent can interact with the web through tool calling, it needs to discover a web site to interact with. In the case of the WebMCP API as currently designed, this problem is implicitly solved because the user must first navigate to the page. For an agent to access the tools from a web site that the user isn't currently browsing, it needs some means of discovering that site and getting information about the tools and context that site provides. | ||
|
||
Discovery is a complex topic which goes beyond the scope of this explainer but merits some discussion here for the purpose of understanding the feature end-to-end. Fundamentally, tool discovery requires some way for agents to obtain the URL of a site that has tools relevant to the user's tasks, and a way to install that site's service worker for handling tool requests. There are many potential mechanisms for this, including but not limited to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fundamentally, tool discovery requires some way for agents to obtain the URL of a site that has tools relevant to the user's tasks
Are you considering UI actuation in this design or is the Agent limited to sites which support WebMCP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design is mainly talking about discovery for sites that support service worker WebMCP.
Discovering a website that can help the user with X tasks through actuation or on-page WebMCP tools is accomplished with a web search and a navigation.
For service-worker based tools though, we want to support the caser where the user never actually navigates to the web site (they agent discovers and registers the service worker directly). So, this design discusses what infrastructure an agent would need to find and activate that service worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to reconcile the broader question of how we see these tools as means for discovering capabilities of sites. #8 is focused on that. While these tools can be one way of understanding what a site can accomplish, indexing using content built for human consumption in the site will be the better source of truth of a site's capabilities for a while.
That said, I still don't see how the discovery aspect is different between service worker vs on-page WebMCP tools. The Agent is making 2 decisions when the user makes a query:
- Which site/sites should I load for this query.
- Should I interact with this site in the foreground or background.
Discovery will happen first for 1). Once the site has been decided, whether it's loaded in a tab or there's just-in-time fetching of service worker (as is proposed here) will be a separate decision from discovery.
You could say that first an Agent decides whether the task should be executed in the background. If yes, then it biases towards sites which support WebMCP in a service worker. But I don't see a world where that's optimal for the user. The Agent should be selecting the site based on it's capabilities (and using the best integration option available), not based on the type of integration it supports.
That's always an option, but one advantage of using WebMCP for background tasks is that if the user is already logged in to the site/app then the WebMCP service worker will have those credentials. Otherwise, the service worker can open a tab for the user to authenticate. As I understand it, auth with remote MCP servers is still quite rough, and requires creating and managing auth tokens. |
@bwalderman That's a fair point, it would be beneficial for readonly tasks in particular. What are your thoughts on supporting elicitation in the webMCP spec. It would be a good way to focus on the remote tab in the case user input is needed to unblock the model. |
I'm going to open a separate issue to discuss elicitation to avoid this PR thread going too off topic :) |
Adding a supplemental explainer document for WebMCP within service workers. This would enable web developers to handle tool calls in service workers that can be activated on-demand, even if the web site isn't currently opened. This makes it possible to implement tools that work "in the background", but with the option to open windows for user input if needed.