Lightweight MCP/API for browser automation
A concise MCP server and Rust library: navigate, get_links, follow_link, list_content (links+images), get_content, get_image, save_content, screenshot (full or element). Lazy browser init. Parallel reads via RwLock.
- π MCP Server (primary) -
navigate,get_links,follow_link,list_content,get_content,get_image,save_content,screenshot,generate_sitemaptools for AI assistants - β¨οΈ CLI - Autonomous browsing tasks
- π¦ Library - Full agent system with LLM, custom actions
Building AI agents that can navigate and interact with websites is challenging. You need to:
- Extract structured data from unstructured HTML - Parse complex DOM trees and make them LLM-readable
- Handle browser automation reliably - Manage browser lifecycle, CDP connections, and process management
- Coordinate multiple subsystems - Orchestrate DOM extraction, LLM inference, and action execution
- Maintain testability - Mock components for unit testing without real browsers
- Support extensibility - Add custom actions, browser backends, and LLM providers
Browsing solves all of this with a clean, modular, and well-tested architecture.
- BrowserClient trait - Abstract browser operations for easy mocking and alternative backends
- DOMProcessor trait - Pluggable DOM processing implementations
- ActionHandler trait - Extensible action system for custom behaviors
- Complete agent execution loop with LLM integration
- Robust action parsing with JSON repair
- History tracking with state snapshots
- Graceful error handling and recovery
- Cross-platform support (macOS, Linux, Windows)
- Automatic browser detection
- Chrome DevTools Protocol (CDP) integration
- Tab management (create, switch, close)
- Screenshot capture (page and element-level)
- Full CDP integration (DOM, AX tree, Snapshot)
- LLM-ready serialization with interactive element indices
- Accessibility tree support for better semantic understanding
- Optimized for token efficiency
- Manager-based architecture (TabManager, NavigationManager, ScreenshotManager)
- Custom action registration
- Utility traits for reduced code duplication
- Comprehensive test coverage (200+ tests)
[dependencies]
browsing = "0.1"
tokio = { version = "1.40", features = ["full"] }cargo install --path . --bin browsingcargo build --release --bin browsing-mcp# Run an autonomous browsing task
browsing run "Find the latest news about AI" --url https://news.ycombinator.com --headless
# Launch a browser and get CDP URL
browsing launch --headless
# Connect to existing browser
browsing connect ws://localhost:9222/devtools/browser/abc123Configure in Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"browsing": {
"command": "/path/to/browsing/target/release/browsing-mcp",
"env": {
"BROWSER_USE_HEADLESS": "true"
}
}
}
}Then ask Claude:
"Navigate to rust-lang.org, get the links, follow the second link, and screenshot the main content area"
use anyhow::Result;
use browsing::{Browser, Config};
#[tokio::main]
async fn main() -> Result<()> {
browsing::init();
let config = Config::from_env();
let browser = Browser::launch(config.browser_profile).await?;
browser.navigate("https://example.com").await?;
let state = browser.get_browser_state_summary(true).await?;
println!("Title: {}", state.title);
Ok(())
}π Full Library Documentation
use browsing::{Browser, BrowserProfile};
// Option 1: Auto-launch browser (default)
let profile = BrowserProfile::default();
let browser = Browser::new(profile);
// Option 2: Connect to existing browser
let browser = Browser::new(profile)
.with_cdp_url("http://localhost:9222".to_string());
// Option 3: Custom browser executable
use browsing::browser::launcher::BrowserLauncher;
let launcher = BrowserLauncher::new(profile)
.with_executable_path(std::path::PathBuf::from("/path/to/chrome"));use browsing::traits::{BrowserClient, DOMProcessor};
use browsing::agent::Agent;
use std::sync::Arc;
// Create mock browser for testing
struct MockBrowser {
navigation_count: std::sync::atomic::AtomicUsize,
}
#[async_trait::async_trait]
impl BrowserClient for MockBrowser {
async fn start(&mut self) -> Result<(), BrowsingError> {
Ok(())
}
async fn navigate(&mut self, _url: &str) -> Result<(), BrowsingError> {
self.navigation_count.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
Ok(())
}
// ... implement other trait methods
}
#[tokio::test]
async fn test_agent_with_mock_browser() {
let mock_browser = Box::new(MockBrowser {
navigation_count: std::sync::atomic::AtomicUsize::new(0),
});
// Test agent behavior without real browser
let dom_processor = Box::new(MockDOMProcessor::new());
let llm = MockLLM::new();
let mut agent = Agent::new("Test task".to_string(), mock_browser, dom_processor, llm);
// ... test agent
}use browsing::{Browser, BrowserProfile};
use browsing::dom::DOMProcessorImpl;
use browsing::traits::DOMProcessor;
#[tokio::main]
async fn main() -> browsing::error::Result<()> {
let mut browser = Browser::new(BrowserProfile::default());
browser.start().await?;
// Navigate to website
browser.navigate("https://www.ibm.com").await?;
tokio::time::sleep(tokio::time::Duration::from_secs(3)).await;
// Extract content
let cdp_client = browser.get_cdp_client()?;
let session_id = browser.get_session_id()?;
let target_id = browser.get_current_target_id()?;
let dom_processor = DOMProcessorImpl::new()
.with_cdp_client(cdp_client, session_id)
.with_target_id(target_id);
let page_content = dom_processor.get_page_state_string().await?;
println!("Extracted {} bytes of content", page_content.len());
// Save to file
std::fs::write("ibm_content.txt", page_content)?;
Ok(())
}Run this example:
cargo run --example ibm_content_downloaduse browsing::Browser;
let browser = Browser::new(BrowserProfile::default());
browser.start().await?;
// Full page screenshot
let screenshot_data = browser.take_screenshot(
Some("screenshot.png"), // path
true, // full_page
).await?;
// Viewport only
let viewport = browser.take_screenshot(
Some("viewport.png"),
false,
).await?;use browsing::{Browser, BrowserProfile};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut browser = Browser::new(BrowserProfile::default());
browser.start().await?;
// Navigate
browser.navigate("https://example.com").await?;
// Get current URL
let url = browser.get_current_url().await?;
println!("Current URL: {}", url);
// Tab management
browser.create_new_tab(Some("https://hackernews.com")).await?;
let tabs = browser.get_tabs().await?;
println!("Open tabs: {}", tabs.len());
// Switch tabs
browser.switch_to_tab(&tabs[0].target_id).await?;
Ok(())
}use browsing::tools::views::{ActionHandler, ActionParams, ActionContext, ActionResult};
use browsing::agent::views::ActionModel;
use browsing::error::Result;
struct CustomActionHandler;
#[async_trait::async_trait]
impl ActionHandler for CustomActionHandler {
async fn execute(
&self,
params: &ActionParams<'_>,
context: &mut ActionContext<'_>,
) -> Result<ActionResult> {
// Custom action logic here
Ok(ActionResult {
extracted_content: Some("Custom result".to_string()),
..Default::default()
})
}
}
// Register custom action
agent.tools.register_custom_action(
"custom_action".to_string(),
"Description of custom action".to_string(),
None, // domains
CustomActionHandler,
);Browsing follows SOLID principles with a focus on separation of concerns, testability, and maintainability.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent β
β βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬ββββββββββ β
β β Browser β DOMProcessor β LLM β Tools β β
β β (trait) β (trait) β (trait) β β β
β ββββββββ¬βββββββ΄βββββββ¬ββββββββ΄βββββββ¬ββββββββ΄βββββ¬βββββ β
β β β β β β
βββββββββββΌββββββββββββββΌβββββββββββββββΌβββββββββββββΌββββββββ
β β β β
βββββββΌβββββββ βββββΌβββββ ββββββΌββββ ββββββΌββββββ
β Browser β βDomSvc β β LLM β β Handlers β
β β β β β β β β
βTabManager β βCDP β βChat β βNavigationβ
βNavManager β βHTML β βModel β βInteractionβ
βScreenshot β βTree β β β βTabs β
β β βBuilder β β β βContent β
ββββββββββββββ ββββββββββ ββββββββββ ββββββββββββ
| Component | Responsibility | Trait-Based |
|---|---|---|
| Agent | Orchestrates browser, LLM, and DOM processing | Uses BrowserClient, DOMProcessor |
| Browser | Manages browser session and lifecycle | Implements BrowserClient |
| DOMProcessor | Extracts and serializes DOM | Implements DOMProcessor |
| Tools | Action registry and execution | Uses BrowserClient trait |
| Handlers | Specific action implementations | Use ActionHandler trait |
browsing/
βββ src/
β βββ agent/ # Agent orchestration
β β βββ service.rs # Main agent implementation
β β βββ json_extractor.rs # JSON parsing utilities
β βββ browser/ # Browser management
β β βββ session.rs # Browser session (BrowserClient impl)
β β βββ tab_manager.rs # Tab operations
β β βββ navigation.rs # Navigation operations
β β βββ screenshot.rs # Screenshot operations
β β βββ cdp.rs # CDP WebSocket client
β β βββ launcher.rs # Browser launcher
β β βββ profile.rs # Browser configuration
β βββ dom/ # DOM processing
β β βββ processor.rs # DOMProcessor trait impl
β β βββ serializer.rs # LLM-ready serialization
β β βββ tree_builder.rs # DOM tree construction
β β βββ cdp_client.rs # CDP wrapper for DOM
β β βββ html_converter.rs # HTML to markdown
β βββ tools/ # Action system
β β βββ service.rs # Tools registry
β β βββ handlers/ # Action handlers
β β β βββ navigation.rs
β β β βββ interaction.rs
β β β βββ tabs.rs
β β β βββ content.rs
β β β βββ advanced.rs
β β βββ params.rs # Parameter extraction
β βββ traits/ # Core trait abstractions
β β βββ browser_client.rs # BrowserClient trait
β β βββ dom_processor.rs # DOMProcessor trait
β βββ llm/ # LLM integration
β β βββ base.rs # ChatModel trait
β βββ actor/ # Low-level interactions
β β βββ page.rs # Page operations
β β βββ element.rs # Element operations
β β βββ mouse.rs # Mouse interactions
β βββ config/ # Configuration
β βββ error/ # Error types
β βββ utils/ # Utilities
βββ Cargo.toml
- BrowserClient - Abstract browser operations for testing and alternative backends
- DOMProcessor - Pluggable DOM processing implementations
- ActionHandler - Extensible action system
- ChatModel - LLM provider abstraction
- TabManager - Tab operations (create, switch, close)
- NavigationManager - Navigation logic
- ScreenshotManager - Screenshot capture
- Handlers - Focused action implementations
- ActionParams - Reusable parameter extraction
- JSONExtractor - Centralized JSON parsing
- SessionGuard - Unified session access
- Split complex methods into focused helpers
- Clear naming and single responsibility
- Minimal dependencies between modules
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_agent_workflow
# Run integration tests only
cargo test --test integration- 317 tests across all modules (all passing)
- 50+ integration tests for full workflow
- 150+ unit tests for individual components
- Test files:
- actor_test.rs - Page, Element, Mouse, Keyboard operations (23 passed)
- browser_managers_test.rs - Navigation, Screenshot, Tab managers
- tools_handlers_test.rs - All action handlers (49 passed)
- agent_service_test.rs - Agent execution logic (32 passed)
- agent_execution_test.rs - Agent workflow tests (11 passed)
- traits_test.rs - BrowserClient, DOMProcessor traits (24 passed)
- utils_test.rs - URL extraction, signal handling (49 passed)
- Mock implementations for deterministic testing
- Trait-based mocking for browser/DOM components
IMPORTANT: The browsing library never deletes browser data for safety reasons.
| Data Type | Behavior |
|---|---|
| Bookmarks | Never deleted |
| History | Never deleted |
| Cookies | Never deleted |
| Passwords | Never deleted |
| Extensions | Never deleted |
| Cache | Never deleted |
| Temp Directories | Never deleted (left in /tmp/) |
- User Safety: Users may specify a custom
user_data_dirpointing to their real browser profile - Catastrophe Prevention: Accidentally deleting a user's real browser data (bookmarks, history, passwords) would be devastating
- Debugging: Leaving temp directories allows inspection after crashes or failures
- User Control: Users are responsible for managing their own browser data
When no user_data_dir is specified:
let profile = BrowserProfile {
user_data_dir: None, // Uses temp directory: /tmp/browser-use-1738369200000/
..Default::default()
};When browser.stop() is called:
- β Browser process is killed
- β In-memory state is cleared
- β User data directory is NOT deleted
Users are responsible for cleanup:
# List browser temp directories
ls -la /tmp/browser-use-*
# Delete old temp directories (optional, manual cleanup)
rm -rf /tmp/browser-use-1738369200000/let profile = BrowserProfile {
user_data_dir: Some("/path/to/custom/profile".into()),
..Default::default()
};Warning: If you point to your real browser profile, the library will NOT protect it. You're responsible for that directory.
use browsing::BrowserProfile;
let profile = BrowserProfile {
headless: true,
browser_type: browsing::BrowserType::Chrome,
user_data_dir: None,
disable_gpu: true,
..Default::default()
};use browsing::agent::views::AgentSettings;
let agent = Agent::new(...)
.with_max_steps(50)
.with_settings(AgentSettings {
override_system_message: Some("Custom system prompt".to_string()),
..Default::default()
});Generate and view API docs:
cargo doc --open