DOMAgent

A browser extension + MCP server that lets AI agents control your real browser — no headless browsers, no puppets. Works with Chrome and Firefox.

The agent connects to the MCP server running on your machine, and the MCP server relays commands to the browser via the extension. Every action (click, type, screenshot, navigate…) happens in your real, already-open browser window.

npx domagent

Why DOMAgent?

Without DOMAgent	With DOMAgent
Headless browser — invisible, no session	Your real browser — logged in, cookies intact
Slow Puppeteer/Playwright spin-up	Instant — extension already loaded
Can't interact with your open tabs	Can adopt any tab you already have open
Complex DevTools setup	One-click install + `npm start`

Architecture

┌──────────────────┐  WebSocket   ┌──────────────────┐   stdio    ┌──────────────┐
│  Browser         │◄────────────►│   MCP Server      │◄──────────►│  AI Agent    │
│  Extension       │  CDP relay   │  (server.js)      │   MCP      │ (Claude,     │
│  background.js   │              │  + index.js       │            │  Ollama, …)  │
└──────────────────┘              └──────────────────┘            └──────────────┘

chrome-extension/
├── domagent-extension/
│   ├── chrome/                  ← Chrome extension
│   │   ├── background.js        ← Service worker (CDP via debugger API)
│   │   ├── manifest.json
│   │   ├── options.html / options.js
│   │   └── icons/
│   └── firefox/                 ← Firefox extension
│       ├── background.js        ← Background script (content-script relay)
│       ├── content.js           ← Content script injected into pages
│       ├── manifest.json
│       ├── options.html / options.js
│       └── icons/
└── domagent-mcp/                ← Node.js MCP server (runs locally)
    ├── index.js
    ├── server.js
    └── package.json

Available MCP Tools

Tool	Description
`navigate`	Open a URL — reuses the automation tab (no duplicate tabs)
`use_current_tab`	Adopt the user's active tab — no new tab created
`click`	Click an element by CSS selector (shows orange visual indicator)
`type_text`	Type into an input field by CSS selector (shows blue visual indicator)
`get_text`	Get the text content of an element
`evaluate_script`	Execute arbitrary JavaScript in the page
`get_screenshot`	Capture a PNG screenshot of the current page
`get_interactive_elements`	List all visible interactive elements with selectors and bounding boxes
`clear_overlays`	Remove all visual overlay boxes from the page

Tab Management

The extension uses a single automation tab design so your other tabs are never hijacked:

First navigate call → creates one new tab, pins it as the automation tab
Subsequent navigate calls → reuses that same tab (navigates to the new URL)
use_current_tab → adopts whatever tab is currently focused (no new tab)
All commands (click, type, screenshot…) → always target the automation tab via session ID
Your other tabs → never touched

Quick Start

1. Start the MCP Server

cd domagent-mcp
npm install
npm start

The server starts a WebSocket on ws://127.0.0.1:18792/extension and waits for the browser extension to connect.

2. Load the Extension

Chrome → See domagent-extension/chrome/README.md
Firefox → See domagent-extension/firefox/README.md

3. Connect your AI Agent

Configure your AI agent to use the MCP server via stdio transport.

Recommended — use the npm package (no path needed):

{
  "mcpServers": {
    "domagent": {
      "command": "npx",
      "args": ["domagent"]
    }
  }
}

Alternative — run from source:

{
  "mcpServers": {
    "domagent": {
      "command": "node",
      "args": ["/absolute/path/to/domagent-mcp/index.js"]
    }
  }
}

Extension Options

Right-click the extension icon → Options to configure the WebSocket connection:

Setting	Default
Host	`127.0.0.1`
Port	`18792`
WS Path	`/extension`

Visual Indicators

When the automation clicks or types, a brief visual indicator appears:

🟠 Orange dot → click action
🔵 Blue dot → type action
🟡 Yellow dashed box → highlighted interactive element
🟢 Green dashed box → highlighted typeable element

Indicators pulse and fade automatically without interfering with the page.

How Chrome vs Firefox Differ

Feature	Chrome	Firefox
DOM access method	`chrome.debugger` API (CDP)	Content script relay
Background context	Service Worker	Persistent background script
Debug banner	Yes (suppressible with flag)	No banner
Min version	Any modern Chrome	Firefox 109+

Browser-Specific Guides

Browser	README
� Chrome	`domagent-extension/chrome/README.md`
🦊 Firefox	`domagent-extension/firefox/README.md`

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
domagent-extension		domagent-extension
domagent-mcp		domagent-mcp
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOMAgent

Why DOMAgent?

Architecture

Available MCP Tools

Tab Management

Quick Start

1. Start the MCP Server

2. Load the Extension

3. Connect your AI Agent

Extension Options

Visual Indicators

How Chrome vs Firefox Differ

Browser-Specific Guides

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DOMAgent

Why DOMAgent?

Architecture

Available MCP Tools

Tab Management

Quick Start

1. Start the MCP Server

2. Load the Extension

3. Connect your AI Agent

Extension Options

Visual Indicators

How Chrome vs Firefox Differ

Browser-Specific Guides

About

Resources

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages