Skip to content

wang-neo/clawbot

Repository files navigation

Clawbot

MIT License Node.js >=18

Open-source AI agent browsing tool. Read any page through your real browser session -- cookies, auth, and all.

Clawbot is a local-first, open-source alternative to Dokobot. It exposes your real browser as an HTTP API so that CLI tools, AI agents, and scripts can read, automate, and extract data from any web page -- including pages behind logins. No cloud dependency, no usage quotas, no telemetry by default.

63 test files, 897 test cases, all passing. 67 TypeScript/TSX source files, 14,500+ lines.

Architecture

                     your code
                         |
                 +-------+-------+
                 |               |
               CLI            HTTP client
                 |               |
                 +-------+-------+
                         |
                  HTTP over Unix socket
                         |
              +----------+----------+
              |   Clawbot Bridge    |
              |   (Node.js process) |
              +----------+----------+
                         |
               Chrome Native Messaging
                         |
    +--------------------+--------------------+
    |         Browser Extension (MV3)         |
    |                                         |
    |  +----------------+  +----------------+ |
    |  | Service Worker |  | Content Script | |
    |  | (background/)  |  | (content/)     | |
    |  +----------------+  +----------------+ |
    |  +----------------+  +----------------+ |
    |  | React Popup    |  | iframe Popup   | |
    |  | (ThemeProvider) |  | (Shadow DOM)   | |
    |  +----------------+  +----------------+ |
    |  +----------------+  +----------------+ |
    |  | Offscreen      |  | Sandbox        | |
    |  | (Translate/Seg)|  | (Code Runner)  | |
    |  +----------------+  +----------------+ |
    +--------------------+--------------------+
                         |
                    Web Page
              (real session + DOM)

The bridge is a Node.js process that listens on a Unix domain socket. The browser extension connects to it via Chrome's Native Messaging protocol. Content scripts injected into web pages handle DOM reading, automation actions, screenshots, and data extraction.

Features

  • Web page reading -- multi-screen scroll collection with deduplication and Markdown formatting
  • Page automation -- 16 action types (click, type, select, hover, scroll, navigate, drag-and-drop, file upload, etc.)
  • Screenshots -- capture visible tab as PNG or JPEG, including full-page mode
  • Structured extraction -- simplified HTML AST with CSS-in-JS class filtering
  • Script execution -- sandbox with virtual DOM and $r/$x/$s helpers
  • Tab management -- create, switch, list, close browser tabs
  • Image collection -- 4 sources (img, SVG, meta tags, CSS backgrounds) with Shadow DOM support
  • AI decision engine -- SSE streaming multi-step automation (pluggable LLM backend)
  • WebSocket real-time control -- replaces long-polling with lower-latency bidirectional communication
  • Encryption -- JWE AES-256-GCM for sensitive data transport
  • Translation -- page content translation via offscreen document engine
  • Search -- multi-engine web search (Google, Bing, DuckDuckGo)
  • Chrome Extension MV3 -- Service Worker with keep-alive, offscreen documents, sandboxed iframes
  • React popup UI -- ThemeProvider (light/dark/system), Status/Sessions/Settings tabs
  • iframe popup -- Shadow DOM encapsulation with ChatPanel, AutomationView, ChunkViewer
  • i18n -- English and Chinese
  • Bridge process -- Unix socket IPC with request validation
  • Zod request validation -- schema-validated API inputs
  • OperationLock -- concurrent operation serialization
  • Token bucket rate limiting -- 100 requests/min with 401/429 responses

Quick Start

1. Clone and install

git clone https://github.com/nicobailon/clawbot.git
cd clawbot
npm install

Requires Node.js 18 or later.

2. Build the extension

npm run build

This produces dist/extension/ with background.js, content.js, offscreen.js, sandbox.js, popup.js, iframe-popup.js, and manifest.json.

3. Install the native bridge

npm run install-bridge

This compiles the bridge TypeScript and registers the native messaging host with Chrome. You will need to provide your extension ID (shown on chrome://extensions/ after loading the extension).

Alternatively, run the all-in-one setup script:

npx tsx scripts/install-skill.ts

4. Load the extension in Chrome

  1. Open chrome://extensions/
  2. Enable Developer mode (top-right toggle)
  3. Click Load unpacked
  4. Select the dist/extension/ directory

5. Test it

clawbot status

You should see the bridge status, device ID, and extension connection state.

HTTP API

The bridge exposes an HTTP server on a Unix domain socket at ~/.clawbot/bridges/<device-id>.sock. All endpoints (except GET /status and GET /health) require Authorization: Bearer <api-key>.

Method Endpoint Description
GET /health Health check
GET /status Bridge status, device ID, extension connectivity
POST /read Read page content (text or chunks, multi-screen)
POST /execute Execute an action (click, type, scroll, etc.)
POST /screenshot Capture visible tab as PNG or JPEG
POST /extract Extract simplified DOM structure
POST /script Run custom JavaScript in a sandbox
POST /tabs Create, switch, list, or close browser tabs
POST /close-session Close a reading session
POST /download/images Download images from a page to a local directory
POST /collect-images Collect image URLs and metadata from a page
POST /automation Run a multi-step automation script
POST /segments Get page visual segments
POST /search Search the web (Google/Bing/DuckDuckGo)
POST /translate Translate page content
POST /analyze Analyze page text
POST /generate Generate content
POST /decide AI-driven multi-step automation (SSE streaming)
GET /events SSE event stream
GET /encrypt Get encryption status
PUT /encrypt Set encryption key

All POST endpoints return { ok: true, data: ... } on success. The /decide endpoint supports SSE streaming when the client sends Accept: text/event-stream.

CLI Commands

clawbot <command> [options]
Command Description
clawbot status Check bridge and extension status
clawbot read <url> Read a webpage
clawbot execute <url> Execute an action on a page element
clawbot screenshot [url] Take a screenshot
clawbot extract [url] Extract structured page data
clawbot images [url] Collect images from a page
clawbot automation [url] Run an automation script
clawbot search <query> Search the web
clawbot download images <url> Download images to a local directory
clawbot install-bridge Install native messaging host
clawbot uninstall-bridge Uninstall native messaging host
clawbot doko list List connected devices
clawbot doko close Close a device connection
clawbot update Check for updates
clawbot feedback Submit feedback
clawbot telemetry Toggle telemetry (off by default)
clawbot config show Show current configuration
clawbot config set <key> <val> Set a configuration value
clawbot config get <key> Get a configuration value
clawbot config unset <key> Unset a configuration value
clawbot help Show help message
clawbot completion Generate shell completion script

Common flags

Read:
  --screens <n>       Number of screens to capture
  --format <fmt>      Output format: text | chunks
  --reuse-tab         Reuse an existing browser tab
  -o <file>           Save output to file

Execute:
  --action <type>     Action type (click, type, select, scroll, etc.)
  --xpath <xpath>     XPath of the target element
  --value <value>     Value for the action

Screenshot:
  --format <fmt>      Image format: png | jpeg
  -o <file>           Save screenshot to file (required)

Automation:
  --steps <file>      Path to JSON file with automation steps (required)

Images:
  --max <n>           Maximum number of images
  --formats <csv>     Comma-separated format filter (e.g. png,jpeg,webp)
  --include-bg        Include background images
  -o <file>           Save image data to file

Environment:
  NO_COLOR=1          Disable colored output

Development

Build commands

npm run build              # Build everything (extension + bridge + CLI)
npm run build:prod         # Production build with minification
npm run build:analyze      # Production build with bundle analysis
npm run build:extension    # Build browser extension only (Vite)
npm run build:bridge       # Build native bridge only (tsc)
npm run build:cli          # Build CLI only (tsc)

Install commands

npm run install-bridge     # Register native messaging host
npm run install-skill      # All-in-one setup (build + install)

Lint and format

npm run lint               # Run ESLint on src/ and tests/
npm run lint:fix           # Auto-fix ESLint issues
npm run format             # Format with Prettier
npm run format:check       # Check formatting without writing

Type checking

npm run typecheck          # Run tsc --noEmit across all configs

Testing

Unit tests

npm test                   # Run vitest

897 test cases across 63 files covering: snowflake IDs, logger, text assembler, session manager, HTML simplifier, chunk collector, encryption, data vault, command dispatcher, action executor, automation executor, WebSocket server, conversation manager, remote control, auth manager, rate limiter, React components, Zod schemas, bridge auth, E2E integration, and more.

E2E tests

npm run test:e2e           # Run Playwright tests

Tests covering the popup UI, content script injection, and background service worker.

Combined

npm run test:all           # Run unit tests + typecheck

Project Structure

src/
  bridge/                     # Native bridge (Node.js process)
    main.ts                   # Bridge entry point + command routing
    ipc-server.ts             # HTTP-over-Unix-socket server
    native-messaging.ts       # Chrome Native Messaging protocol
    auth.ts                   # API key generation and validation
    auth-manager.ts           # Session tokens and constant-time comparison
    rate-limiter.ts           # Token bucket rate limiting
    bridge-registry.ts        # Multi-bridge device management
    ws-server.ts              # WebSocket server for real-time control
    event-broadcaster.ts      # SSE event broadcasting

  extension/                  # Browser extension (Manifest V3)
    background/               # Service Worker
      index.ts                # Extension entry point + command dispatch
      native-bridge.ts        # Connection to bridge
      command-dispatcher.ts   # Route commands to content scripts
      tab-manager.ts          # Tab lifecycle management
      session-manager.ts      # Reading session state (120s idle timeout)
      automation-executor.ts  # Multi-step automation runner
      encryption.ts           # JWE AES-256-GCM encryption
      data-vault.ts           # IndexedDB storage
      remote-control.ts       # WebSocket command queue
      message-router.ts       # Centralized message routing
      conversation.ts         # Conversation/message/branch CRUD
      cdp-manager.ts          # Chrome DevTools Protocol manager
      error-recovery.ts       # Error recovery and retry logic
      operation-lock.ts       # Concurrent operation serialization
    content/                  # Content scripts (injected into pages)
      index.ts                # Content script entry point
      dom-extractor.ts        # DOM text extraction
      chunk-collector.ts      # Multi-screen scroll collection
      text-assembler.ts       # Text dedup and formatting
      action-executor.ts      # 16 automation action types
      element-locator.ts      # XPath element finding
      image-collector.ts      # Image URL collection (4 sources)
      cdp-bridge.ts           # Chrome DevTools Protocol bridge
      iframe-injector.ts      # iframe popup injection
      text-analyzer.ts        # Text analysis (word freq, readability, sentiment)
      content-generator.ts    # Content generation (summaries, structure)
      search-extractor.ts     # Multi-engine search result extraction
      favicon-animation.ts    # Status indicator via favicon
    shared/                   # Shared utilities
      types.ts                # TypeScript type definitions
      constants.ts            # Message type constants (50+)
      snowflake.ts            # 64-bit time-ordered IDs
      schemas.ts              # Zod validation schemas
      logger.ts               # Structured logging
      ui-constants.ts         # Shared UI constants
    offscreen/                # Offscreen documents (segmentation, translation)
      index.ts                # Offscreen entry point
      container-grouper.ts    # Visual segmentation / container grouping
      translation.ts          # Translation engine
    sandbox/                  # Sandboxed iframes
      index.ts                # Sandbox entry point
      code-runner.ts          # Script execution with virtual DOM
      html-simplifier.ts      # DOM simplification pipeline

  popup/                      # React popup UI
    index.tsx                 # Popup entry point
    App.tsx                   # Root component (Status/Sessions/Settings tabs)
    components/
      StatusBar.tsx           # Connection status display
      QuickActions.tsx        # Action shortcuts
      SessionList.tsx         # Active sessions
      Settings.tsx            # Extension settings
      DeviceManager.tsx       # Device management

  iframe-popup/               # React in-page popup (Shadow DOM)
    index.tsx                 # Shadow DOM entry point
    App.tsx                   # Root component
    components/
      ChatPanel.tsx           # Chat interface
      AutomationView.tsx      # Automation step display
      ChunkViewer.tsx         # Chunk viewer
      ToggleButton.tsx        # Toggle button control

  shared/                     # Shared across popup and iframe-popup
    i18n/
      index.ts                # i18n setup
      en.ts                   # English strings
      zh.ts                   # Chinese strings
    theme/
      ThemeProvider.tsx        # Theme context provider (light/dark/system)
      themes.ts               # Theme definitions

  cli/                        # CLI tool
    index.ts                  # CLI entry point (read, search, download, config, etc.)

scripts/
  build.ts                    # Multi-entry Vite build script
  install-bridge.ts           # Native messaging host installer
  install-skill.ts            # Interactive setup script
  package-extension.ts        # Chrome Web Store packaging

tests/                        # Vitest unit tests (63 files, 897 cases)
e2e/                          # Playwright E2E tests (6 files)

License

MIT

About

Open-source AI agent browsing tool. Read any page through your real browser session.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors