Skip to content

yingkitw/browsing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Browsing

Lightweight MCP/API for browser automation

A concise MCP server and Rust library: navigate, get_links, follow_link, list_content (links+images), get_content, get_image, save_content, screenshot (full or element). Lazy browser init. Parallel reads via RwLock.

🎯 Usage Modes

  1. πŸ”Œ MCP Server (primary) - navigate, get_links, follow_link, list_content, get_content, get_image, save_content, screenshot, generate_sitemap tools for AI assistants
  2. ⌨️ CLI - Autonomous browsing tasks
  3. πŸ“¦ Library - Full agent system with LLM, custom actions

✨ Why Browsing?

Building AI agents that can navigate and interact with websites is challenging. You need to:

  • Extract structured data from unstructured HTML - Parse complex DOM trees and make them LLM-readable
  • Handle browser automation reliably - Manage browser lifecycle, CDP connections, and process management
  • Coordinate multiple subsystems - Orchestrate DOM extraction, LLM inference, and action execution
  • Maintain testability - Mock components for unit testing without real browsers
  • Support extensibility - Add custom actions, browser backends, and LLM providers

Browsing solves all of this with a clean, modular, and well-tested architecture.

🎯 Key Features

πŸ—οΈ Trait-Based Architecture

  • BrowserClient trait - Abstract browser operations for easy mocking and alternative backends
  • DOMProcessor trait - Pluggable DOM processing implementations
  • ActionHandler trait - Extensible action system for custom behaviors

πŸ€– Autonomous Agent System

  • Complete agent execution loop with LLM integration
  • Robust action parsing with JSON repair
  • History tracking with state snapshots
  • Graceful error handling and recovery

🌐 Full Browser Automation

  • Cross-platform support (macOS, Linux, Windows)
  • Automatic browser detection
  • Chrome DevTools Protocol (CDP) integration
  • Tab management (create, switch, close)
  • Screenshot capture (page and element-level)

πŸ“Š Advanced DOM Processing

  • Full CDP integration (DOM, AX tree, Snapshot)
  • LLM-ready serialization with interactive element indices
  • Accessibility tree support for better semantic understanding
  • Optimized for token efficiency

πŸ”§ Extensible & Maintainable

  • Manager-based architecture (TabManager, NavigationManager, ScreenshotManager)
  • Custom action registration
  • Utility traits for reduced code duplication
  • Comprehensive test coverage (200+ tests)

πŸ“¦ Installation

As a Library

[dependencies]
browsing = "0.1"
tokio = { version = "1.40", features = ["full"] }

As a CLI Tool

cargo install --path . --bin browsing

As an MCP Server

cargo build --release --bin browsing-mcp

πŸš€ Quick Start

1️⃣ CLI Usage

# Run an autonomous browsing task
browsing run "Find the latest news about AI" --url https://news.ycombinator.com --headless

# Launch a browser and get CDP URL
browsing launch --headless

# Connect to existing browser
browsing connect ws://localhost:9222/devtools/browser/abc123

πŸ“– Full CLI Documentation

2️⃣ MCP Server Usage

Configure in Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "browsing": {
      "command": "/path/to/browsing/target/release/browsing-mcp",
      "env": {
        "BROWSER_USE_HEADLESS": "true"
      }
    }
  }
}

Then ask Claude:

"Navigate to rust-lang.org, get the links, follow the second link, and screenshot the main content area"

πŸ“– Full MCP Documentation

3️⃣ Library Usage

use anyhow::Result;
use browsing::{Browser, Config};

#[tokio::main]
async fn main() -> Result<()> {
    browsing::init();
    
    let config = Config::from_env();
    let browser = Browser::launch(config.browser_profile).await?;
    
    browser.navigate("https://example.com").await?;
    
    let state = browser.get_browser_state_summary(true).await?;
    println!("Title: {}", state.title);
    
    Ok(())
}

πŸ“– Full Library Documentation

Browser Launch Options

use browsing::{Browser, BrowserProfile};

// Option 1: Auto-launch browser (default)
let profile = BrowserProfile::default();
let browser = Browser::new(profile);

// Option 2: Connect to existing browser
let browser = Browser::new(profile)
    .with_cdp_url("http://localhost:9222".to_string());

// Option 3: Custom browser executable
use browsing::browser::launcher::BrowserLauncher;
let launcher = BrowserLauncher::new(profile)
    .with_executable_path(std::path::PathBuf::from("/path/to/chrome"));

Using Traits for Testing

use browsing::traits::{BrowserClient, DOMProcessor};
use browsing::agent::Agent;
use std::sync::Arc;

// Create mock browser for testing
struct MockBrowser {
    navigation_count: std::sync::atomic::AtomicUsize,
}

#[async_trait::async_trait]
impl BrowserClient for MockBrowser {
    async fn start(&mut self) -> Result<(), BrowsingError> {
        Ok(())
    }

    async fn navigate(&mut self, _url: &str) -> Result<(), BrowsingError> {
        self.navigation_count.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
        Ok(())
    }

    // ... implement other trait methods
}

#[tokio::test]
async fn test_agent_with_mock_browser() {
    let mock_browser = Box::new(MockBrowser {
        navigation_count: std::sync::atomic::AtomicUsize::new(0),
    });

    // Test agent behavior without real browser
    let dom_processor = Box::new(MockDOMProcessor::new());
    let llm = MockLLM::new();

    let mut agent = Agent::new("Test task".to_string(), mock_browser, dom_processor, llm);
    // ... test agent
}

πŸ“š Usage Examples

Content Download

use browsing::{Browser, BrowserProfile};
use browsing::dom::DOMProcessorImpl;
use browsing::traits::DOMProcessor;

#[tokio::main]
async fn main() -> browsing::error::Result<()> {
    let mut browser = Browser::new(BrowserProfile::default());
    browser.start().await?;

    // Navigate to website
    browser.navigate("https://www.ibm.com").await?;
    tokio::time::sleep(tokio::time::Duration::from_secs(3)).await;

    // Extract content
    let cdp_client = browser.get_cdp_client()?;
    let session_id = browser.get_session_id()?;
    let target_id = browser.get_current_target_id()?;

    let dom_processor = DOMProcessorImpl::new()
        .with_cdp_client(cdp_client, session_id)
        .with_target_id(target_id);

    let page_content = dom_processor.get_page_state_string().await?;
    println!("Extracted {} bytes of content", page_content.len());

    // Save to file
    std::fs::write("ibm_content.txt", page_content)?;
    Ok(())
}

Run this example:

cargo run --example ibm_content_download

Screenshot Capture

use browsing::Browser;

let browser = Browser::new(BrowserProfile::default());
browser.start().await?;

// Full page screenshot
let screenshot_data = browser.take_screenshot(
    Some("screenshot.png"),  // path
    true,                      // full_page
).await?;

// Viewport only
let viewport = browser.take_screenshot(
    Some("viewport.png"),
    false,
).await?;

Direct Browser Control

use browsing::{Browser, BrowserProfile};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut browser = Browser::new(BrowserProfile::default());
    browser.start().await?;

    // Navigate
    browser.navigate("https://example.com").await?;

    // Get current URL
    let url = browser.get_current_url().await?;
    println!("Current URL: {}", url);

    // Tab management
    browser.create_new_tab(Some("https://hackernews.com")).await?;
    let tabs = browser.get_tabs().await?;
    println!("Open tabs: {}", tabs.len());

    // Switch tabs
    browser.switch_to_tab(&tabs[0].target_id).await?;

    Ok(())
}

Custom Actions

use browsing::tools::views::{ActionHandler, ActionParams, ActionContext, ActionResult};
use browsing::agent::views::ActionModel;
use browsing::error::Result;

struct CustomActionHandler;

#[async_trait::async_trait]
impl ActionHandler for CustomActionHandler {
    async fn execute(
        &self,
        params: &ActionParams<'_>,
        context: &mut ActionContext<'_>,
    ) -> Result<ActionResult> {
        // Custom action logic here
        Ok(ActionResult {
            extracted_content: Some("Custom result".to_string()),
            ..Default::default()
        })
    }
}

// Register custom action
agent.tools.register_custom_action(
    "custom_action".to_string(),
    "Description of custom action".to_string(),
    None,  // domains
    CustomActionHandler,
);

πŸ—οΈ Architecture

Browsing follows SOLID principles with a focus on separation of concerns, testability, and maintainability.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Agent                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Browser   β”‚ DOMProcessor β”‚     LLM      β”‚  Tools  β”‚  β”‚
β”‚  β”‚   (trait)   β”‚    (trait)   β”‚  (trait)     β”‚         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚             β”‚              β”‚            β”‚       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚             β”‚              β”‚            β”‚
    β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
    β”‚  Browser   β”‚ β”‚DomSvc  β”‚   β”‚  LLM   β”‚  β”‚ Handlers β”‚
    β”‚            β”‚ β”‚        β”‚   β”‚        β”‚  β”‚          β”‚
    β”‚TabManager  β”‚ β”‚CDP     β”‚   β”‚Chat    β”‚  β”‚Navigationβ”‚
    β”‚NavManager  β”‚ β”‚HTML    β”‚   β”‚Model   β”‚  β”‚Interactionβ”‚
    β”‚Screenshot  β”‚ β”‚Tree    β”‚   β”‚        β”‚  β”‚Tabs      β”‚
    β”‚            β”‚ β”‚Builder β”‚   β”‚        β”‚  β”‚Content   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

Component Responsibility Trait-Based
Agent Orchestrates browser, LLM, and DOM processing Uses BrowserClient, DOMProcessor
Browser Manages browser session and lifecycle Implements BrowserClient
DOMProcessor Extracts and serializes DOM Implements DOMProcessor
Tools Action registry and execution Uses BrowserClient trait
Handlers Specific action implementations Use ActionHandler trait

πŸ“ Project Structure

browsing/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agent/              # Agent orchestration
β”‚   β”‚   β”œβ”€β”€ service.rs      # Main agent implementation
β”‚   β”‚   └── json_extractor.rs # JSON parsing utilities
β”‚   β”œβ”€β”€ browser/            # Browser management
β”‚   β”‚   β”œβ”€β”€ session.rs      # Browser session (BrowserClient impl)
β”‚   β”‚   β”œβ”€β”€ tab_manager.rs  # Tab operations
β”‚   β”‚   β”œβ”€β”€ navigation.rs   # Navigation operations
β”‚   β”‚   β”œβ”€β”€ screenshot.rs   # Screenshot operations
β”‚   β”‚   β”œβ”€β”€ cdp.rs          # CDP WebSocket client
β”‚   β”‚   β”œβ”€β”€ launcher.rs     # Browser launcher
β”‚   β”‚   └── profile.rs      # Browser configuration
β”‚   β”œβ”€β”€ dom/                # DOM processing
β”‚   β”‚   β”œβ”€β”€ processor.rs    # DOMProcessor trait impl
β”‚   β”‚   β”œβ”€β”€ serializer.rs   # LLM-ready serialization
β”‚   β”‚   β”œβ”€β”€ tree_builder.rs # DOM tree construction
β”‚   β”‚   β”œβ”€β”€ cdp_client.rs   # CDP wrapper for DOM
β”‚   β”‚   └── html_converter.rs # HTML to markdown
β”‚   β”œβ”€β”€ tools/              # Action system
β”‚   β”‚   β”œβ”€β”€ service.rs      # Tools registry
β”‚   β”‚   β”œβ”€β”€ handlers/       # Action handlers
β”‚   β”‚   β”‚   β”œβ”€β”€ navigation.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ interaction.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ tabs.rs
β”‚   β”‚   β”‚   β”œβ”€β”€ content.rs
β”‚   β”‚   β”‚   └── advanced.rs
β”‚   β”‚   └── params.rs       # Parameter extraction
β”‚   β”œβ”€β”€ traits/             # Core trait abstractions
β”‚   β”‚   β”œβ”€β”€ browser_client.rs  # BrowserClient trait
β”‚   β”‚   └── dom_processor.rs   # DOMProcessor trait
β”‚   β”œβ”€β”€ llm/                # LLM integration
β”‚   β”‚   └── base.rs         # ChatModel trait
β”‚   β”œβ”€β”€ actor/              # Low-level interactions
β”‚   β”‚   β”œβ”€β”€ page.rs         # Page operations
β”‚   β”‚   β”œβ”€β”€ element.rs      # Element operations
β”‚   β”‚   └── mouse.rs        # Mouse interactions
β”‚   β”œβ”€β”€ config/             # Configuration
β”‚   β”œβ”€β”€ error/              # Error types
β”‚   └── utils/              # Utilities
└── Cargo.toml

🎨 Design Principles

Trait-Facing Design

  • BrowserClient - Abstract browser operations for testing and alternative backends
  • DOMProcessor - Pluggable DOM processing implementations
  • ActionHandler - Extensible action system
  • ChatModel - LLM provider abstraction

Separation of Concerns

  • TabManager - Tab operations (create, switch, close)
  • NavigationManager - Navigation logic
  • ScreenshotManager - Screenshot capture
  • Handlers - Focused action implementations

DRY (Don't Repeat Yourself)

  • ActionParams - Reusable parameter extraction
  • JSONExtractor - Centralized JSON parsing
  • SessionGuard - Unified session access

KISS (Keep It Simple, Stupid)

  • Split complex methods into focused helpers
  • Clear naming and single responsibility
  • Minimal dependencies between modules

πŸ§ͺ Testing

# Run all tests
cargo test

# Run with output
cargo test -- --nocapture

# Run specific test
cargo test test_agent_workflow

# Run integration tests only
cargo test --test integration

Test Coverage

  • 317 tests across all modules (all passing)
  • 50+ integration tests for full workflow
  • 150+ unit tests for individual components
  • Test files:
  • Mock implementations for deterministic testing
  • Trait-based mocking for browser/DOM components

⚠️ Data Retention Policy

Browser Data is NEVER Deleted

IMPORTANT: The browsing library never deletes browser data for safety reasons.

What This Means:

Data Type Behavior
Bookmarks Never deleted
History Never deleted
Cookies Never deleted
Passwords Never deleted
Extensions Never deleted
Cache Never deleted
Temp Directories Never deleted (left in /tmp/)

Why This Policy Exists:

  1. User Safety: Users may specify a custom user_data_dir pointing to their real browser profile
  2. Catastrophe Prevention: Accidentally deleting a user's real browser data (bookmarks, history, passwords) would be devastating
  3. Debugging: Leaving temp directories allows inspection after crashes or failures
  4. User Control: Users are responsible for managing their own browser data

How It Works:

When no user_data_dir is specified:

let profile = BrowserProfile {
    user_data_dir: None,  // Uses temp directory: /tmp/browser-use-1738369200000/
    ..Default::default()
};

When browser.stop() is called:

  • βœ… Browser process is killed
  • βœ… In-memory state is cleared
  • ❌ User data directory is NOT deleted

Managing Temporary Data:

Users are responsible for cleanup:

# List browser temp directories
ls -la /tmp/browser-use-*

# Delete old temp directories (optional, manual cleanup)
rm -rf /tmp/browser-use-1738369200000/

Using a Custom Data Directory:

let profile = BrowserProfile {
    user_data_dir: Some("/path/to/custom/profile".into()),
    ..Default::default()
};

Warning: If you point to your real browser profile, the library will NOT protect it. You're responsible for that directory.

πŸ”§ Configuration

Browser Profile

use browsing::BrowserProfile;

let profile = BrowserProfile {
    headless: true,
    browser_type: browsing::BrowserType::Chrome,
    user_data_dir: None,
    disable_gpu: true,
    ..Default::default()
};

Agent Settings

use browsing::agent::views::AgentSettings;

let agent = Agent::new(...)
    .with_max_steps(50)
    .with_settings(AgentSettings {
        override_system_message: Some("Custom system prompt".to_string()),
        ..Default::default()
    });

πŸ“– API Documentation

Generate and view API docs:

cargo doc --open

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages