crawl

Async web search, fetch, crawl, extraction, and screenshot tooling with SDK, CLI, and MCP entrypoints.

curl_cffi for HTTP
nodriver for browser automation
FastMCP for the compact agent-facing MCP layer

What It Is

crawl is organized into three layers:

sdk: the full Python capability surface
cli: direct command-line access to the SDK
mcp: a smaller workflow-oriented server for AI agents

The SDK is the source of truth. The CLI wraps most of it. The MCP layer intentionally exposes fewer tools so agents do not get flooded with schemas.

Install

Editable install:

python -m pip install -e .

Pinned dependencies:

python -m pip install -r requirements.txt

Entry Points

Repo-root entrypoints:

python cli.py --help
python server.py

Installed scripts:

crawl-cli --help
crawl-mcp

Documentation

Highlights

Search providers: google, searxng, auto, hybrid
Browser-capable SDK and MCP paths support headless
Consent handling and resource blocking are built into browser workflows
Structured extraction, article extraction, forms, feeds, contacts, and technology fingerprinting are all in the SDK
The MCP server exposes a compact tool surface:
- search_web
- inspect_url
- discover_site
- extract_structured
- capture_screenshot

Quick Examples

SDK:

import asyncio

from crawl.sdk import fetch_page, websearch


async def main() -> None:
    search_payload = await websearch("python async browser automation", provider="auto")
    page_payload = await fetch_page("https://example.com", mode="browser", headless=True)
    print(search_payload["count"])
    print(page_payload["final_url"])


asyncio.run(main())

CLI:

python cli.py websearch "python async browser automation" --provider auto --max-results 5 --pages 1
python cli.py fetch-page https://example.com --mode browser --include-html

MCP:

run python server.py
connect your MCP client to the stdio server
use the compact workflow tools documented in docs/mcp.md

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
assets		assets
docs		docs
src/crawl		src/crawl
.gitignore		.gitignore
README.md		README.md
cli.py		cli.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

crawl

What It Is

Install

Entry Points

Documentation

Highlights

Quick Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

crawl

What It Is

Install

Entry Points

Documentation

Highlights

Quick Examples

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages