Skip to content

vibheksoni/crawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

crawl

crawl

Async web search, fetch, crawl, extraction, and screenshot tooling with SDK, CLI, and MCP entrypoints.

curl_cffi for HTTP
nodriver for browser automation
FastMCP for the compact agent-facing MCP layer

What It Is

crawl is organized into three layers:

  • sdk: the full Python capability surface
  • cli: direct command-line access to the SDK
  • mcp: a smaller workflow-oriented server for AI agents

The SDK is the source of truth. The CLI wraps most of it. The MCP layer intentionally exposes fewer tools so agents do not get flooded with schemas.

Install

Editable install:

python -m pip install -e .

Pinned dependencies:

python -m pip install -r requirements.txt

Entry Points

Repo-root entrypoints:

python cli.py --help
python server.py

Installed scripts:

crawl-cli --help
crawl-mcp

Documentation

Highlights

  • Search providers: google, searxng, auto, hybrid
  • Browser-capable SDK and MCP paths support headless
  • Consent handling and resource blocking are built into browser workflows
  • Structured extraction, article extraction, forms, feeds, contacts, and technology fingerprinting are all in the SDK
  • The MCP server exposes a compact tool surface:
    • search_web
    • inspect_url
    • discover_site
    • extract_structured
    • capture_screenshot

Quick Examples

SDK:

import asyncio

from crawl.sdk import fetch_page, websearch


async def main() -> None:
    search_payload = await websearch("python async browser automation", provider="auto")
    page_payload = await fetch_page("https://example.com", mode="browser", headless=True)
    print(search_payload["count"])
    print(page_payload["final_url"])


asyncio.run(main())

CLI:

python cli.py websearch "python async browser automation" --provider auto --max-results 5 --pages 1
python cli.py fetch-page https://example.com --mode browser --include-html

MCP:

  • run python server.py
  • connect your MCP client to the stdio server
  • use the compact workflow tools documented in docs/mcp.md

About

Async web search, fetch, crawl, and screenshot tooling with SDK, CLI, and MCP interfaces.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages