A Model Context Protocol (MCP) server that enables AI models to perform complete browser automation through the Dex browser extension. You'll have to login to the account you signed up with for your hackathon to download the extension.
Note: you can choose which MCP tools to expose to the agent, and in fact define your own agents as MCP tool calls (eg, @mcp_tool github_agent(instruction: str)), and you can internally chain together the tool calls already built out to interface with the user's browser.
There might be a couple of bugs so would appreciate feedback / reports so I can put out some hotfixes as we go :)
This MCP server provides browser automation capabilities including tab management, navigation, DOM interaction, and visual analysis.
get_tabs: Get all open browser tabs with titles and URLsselect_tab(tab_id): Switch to a specific browser tabnew_tab(url?): Create new tab, optionally with a URLclose_tab(tab_id?): Close specific tab or active tab
navigate(url, tab_id?): Navigate to URL in active or specific tabsearch_google(query, tab_id?): Perform Google search
click_element(element_id, tab_id?): Click DOM elements by IDinput_text(element_id, text, tab_id?): Type text into form fieldssend_keys(keys, tab_id?): Send keyboard shortcuts (Ctrl+C, Enter, etc.)
screenshot(): Capture screenshot of active tabcapture_with_highlights(tab_id?): Screenshot with interactive element highlightsgrab_dom(tab_id?): Get formatted DOM structure with XPath mappings
- Install dependencies:
pip install -r requirements.txt- Start the MCP server:
uv run main.pyThe server will:
- Start an MCP server with SSE transport on the default port
- Start a WebSocket server on
ws://127.0.0.1:8765for browser extension connections
The server connects via WebSocket to ws://127.0.0.1:8765 and handles these message types:
| Type | Parameters | Description |
|---|---|---|
get_tabs |
None | Get all browser tabs |
screenshot |
None | Screenshot active tab |
navigate |
url, tab_id? |
Navigate to URL |
select_tab |
tab_id |
Switch to tab |
new_tab |
url? |
Create new tab |
close_tab |
tab_id? |
Close tab |
search_google |
query, tab_id? |
Google search |
click_element |
element_id, tab_id? |
Click DOM element |
input_text |
element_id, text, tab_id? |
Type into element |
send_keys |
keys, tab_id? |
Send keyboard input |
grab_dom |
tab_id? |
Get DOM structure |
capture_with_highlights |
tab_id? |
Screenshot with highlights |
get_tabs:
{
"result": {
"action": "get_all_tabs",
"success": true,
"data": [
{"id": 1, "title": "Example", "url": "https://example.com"},
{"id": 2, "title": "Google", "url": "https://google.com"}
]
}
}screenshot:
{
"result": {
"action": "screenshot_active_tab",
"success": true,
"message": "Screenshot captured",
"data": "..."
}
}new_tab:
{
"result": {
"action": "new_tab",
"success": true,
"message": "New tab opened",
"data": {"id": 1234}
}
}grab_dom:
{
"result": {
"success": true,
"data": {
"processedOutput": "Formatted DOM structure...",
"highlightToXPath": {
"1": "/html/body/button[1]",
"2": "/html/body/form/input[1]"
},
"html": "<html>...</html>"
}
}
}capture_with_highlights:
{
"result": {
"action": "capture_tab_with_highlights",
"success": true,
"message": "Screenshot captured with highlight data",
"data": {
"dataUrl": "...",
"highlightCount": 15
}
}
}navigate:urlselect_tab:tab_idsearch_google:queryclick_element:element_idinput_text:element_id,textsend_keys:keys
tab_id: Most tools support targeting specific tabs (defaults to active tab)url:new_tabcan optionally specify initial URL
Multi-step Automation:
get_tabs()- See all open tabsnew_tab("https://example.com")- Open new tabgrab_dom()- Analyze page structureclick_element("search-button")- Interact with pageinput_text("search-input", "query")- Fill formssend_keys("Enter")- Submit formscapture_with_highlights()- Take annotated screenshot
- main.py: Entry point and MCP tool definitions
- context.py: WebSocket connection management and message handling
- ws_server.py: WebSocket server for browser extension connections
- tools/browser.py: Complete browser action implementations
To add new browser tools:
- Create the tool function in
tools/browser.pywith proper parameter documentation - Add the MCP tool wrapper in
main.py - Add message handler in browser extension's WebSocket bridge
- Update this documentation
All tools return formatted strings describing action results and extracted data.
The server logs all important events:
- WebSocket connections/disconnections
- Message exchanges with browser extension
- Action successes/failures
- Errors and timeouts