Async web search, fetch, crawl, extraction, and screenshot tooling with SDK, CLI, and MCP entrypoints.
curl_cffi for HTTP
nodriver for browser automation
FastMCP for the compact agent-facing MCP layer
crawl is organized into three layers:
sdk: the full Python capability surfacecli: direct command-line access to the SDKmcp: a smaller workflow-oriented server for AI agents
The SDK is the source of truth. The CLI wraps most of it. The MCP layer intentionally exposes fewer tools so agents do not get flooded with schemas.
Editable install:
python -m pip install -e .Pinned dependencies:
python -m pip install -r requirements.txtRepo-root entrypoints:
python cli.py --help
python server.pyInstalled scripts:
crawl-cli --help
crawl-mcp- Search providers:
google,searxng,auto,hybrid - Browser-capable SDK and MCP paths support
headless - Consent handling and resource blocking are built into browser workflows
- Structured extraction, article extraction, forms, feeds, contacts, and technology fingerprinting are all in the SDK
- The MCP server exposes a compact tool surface:
search_webinspect_urldiscover_siteextract_structuredcapture_screenshot
SDK:
import asyncio
from crawl.sdk import fetch_page, websearch
async def main() -> None:
search_payload = await websearch("python async browser automation", provider="auto")
page_payload = await fetch_page("https://example.com", mode="browser", headless=True)
print(search_payload["count"])
print(page_payload["final_url"])
asyncio.run(main())CLI:
python cli.py websearch "python async browser automation" --provider auto --max-results 5 --pages 1
python cli.py fetch-page https://example.com --mode browser --include-htmlMCP:
- run
python server.py - connect your MCP client to the stdio server
- use the compact workflow tools documented in docs/mcp.md
