Skip to content

[Bug]: MCP scrape tools lack wait_until / SPA support that REST API and CLI provide #1963

@klemens-u

Description

@klemens-u

crawl4ai version

0.8.6

Expected Behavior

MCP scrape tools (crawl4ai_md, crawl4ai_html, etc.) should accept the same wait_until, delay_before_return_html, and cache_mode parameters as the REST API (POST /crawl) and CLI (crwl -c). This would allow MCP-based agents to reliably scrape JavaScript-heavy pages by waiting for content to fully load.

Example desired usage:

{
  "url": "https://example.com/dynamic-content",
  "wait_until": "networkidle",
  "delay_before_return_html": 2
}

Environment

Component Version
Crawl4AI 0.8.6
OS Debian GNU/Linux 12 (bookworm)
Python 3.12.13
Image unclecode/crawl4ai:latest

Suggested Fix

  1. Expose crawler_config parameters on MCP tool schemas — map wait_until, delay_before_return_html, cache_mode, etc. to the existing REST/CLI options
  2. Document MCP defaults vs REST/CLI behavior

Acceptance Criteria

  • MCP scrape tools accept wait_until parameter (load, domcontentloaded, networkidle, commit)
  • MCP scrape tools accept delay_before_return_html parameter
  • Documentation clarifies default wait behavior for each interface (MCP vs REST vs CLI)

Current Behavior

MCP tools return immediately after initial DOM load, without waiting for dynamic content. No parameters are exposed to control wait behavior.

  • REST API with crawler_config.wait_until: "networkidle" → ✅ Complete rendered content
  • CLI with -c 'wait_until=networkidle' → ✅ Complete rendered content
  • MCP tools → ❌ Incomplete content (dynamic elements missing)

Current workaround: Bypass MCP entirely and use POST /crawl directly.

Is this reproducible?

Yes

Inputs Causing the Bug

Any JavaScript-heavy / AJAX-driven page where content loads after initial page load.

Steps to Reproduce

1. Start Crawl4AI with MCP enabled at `/mcp/sse`
2. Configure any MCP client (e.g., OpenCode) with the Crawl4AI MCP server
3. Call MCP scrape tool:
   
   crawl4ai_md(url="https://example.com/dynamic-content")
   
4. Observe: Content is incomplete (dynamic elements not rendered)
5. Compare with working REST call:
   
   curl -s http://localhost:11235/crawl \
     -H 'Content-Type: application/json' \
     -d '{
       "urls": ["https://example.com/dynamic-content"],
       "crawler_config": {
         "wait_until": "networkidle",
         "cache_mode": "bypass"
       }
     }'
   
6. Observe: REST returns complete rendered content

Code snippets

OS

Debian GNU/Linux 12 (bookworm)

Python version

3.12.13

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working🩺 Needs TriageNeeds attention of maintainers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions