crawl4ai version
0.8.5, 0.8.6
Expected Behavior
We should see some logs / error messages for the scraping error.
Description of the issue:
- after successfully marked FETCH no SCRAPE is performed (0.00s time taken) and hence no EXTRACT, so COMPLETE is marked as failed. No error messages/logs are shown, even all possible verbose params are set to TRUE.
- this behaviour has been observed on valid pages derived from a base url - where for the original url the whole process does work OK.
- if a derived page is now fed as base url, the scraping-extraction process runs OK, but, again, for the derived ones, the scraping fails silently.
Could not find such an error pattern anywhere in the documentation nor other passed issues.
The crawler config is described in detail in "code snippets" section.
It uses the BFSDeepCrawlStrategy for crawling and LLMExtractionStrategy for extraction.
Current Behavior
[FETCH]... ↓ |
✓ | ⏱: 1.51s
[SCRAPE].. ◆ |
✓ | ⏱: 0.00s
[EXTRACT]. ■ |
✓ | ⏱: 0.00s
[COMPLETE] ● |
✗ | ⏱: 1.51s
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
browser_cfg = BrowserConfig(
user_agent_mode="random",
enable_stealth=True,
headless=True,
viewport_width=1280,
viewport_height=720,
text_mode=False
)
filter_chain = FilterChain(
[
URLPatternFilter(
patterns=[
...,
]
),
URLPatternFilter(
patterns=[
...
],
reverse=True
)
]
)
async with AsyncWebCrawler(verbose=True, config=browser_cfg, thread_safe=True) as crawler:
crawl_config = CrawlerRunConfig(
extraction_strategy=LLMExtractionStrategy(
llm_config=LLMConfig(
...
),
schema=Model.model_json_schema(),
extraction_type="schema",
instruction="""
...
""",
...
),
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=2,
include_external=False,
max_pages=10,
filter_chain=filter_chain,
),
cache_mode=CacheMode.BYPASS,
exclude_external_links=True,
remove_overlay_elements=True,
process_iframes=True,
stream=True,
magic=True,
verbose=True,
scan_full_page=True
)
OS
Databricks
Python version
3.12.3
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
crawl4ai version
0.8.5, 0.8.6
Expected Behavior
We should see some logs / error messages for the scraping error.
Description of the issue:
Could not find such an error pattern anywhere in the documentation nor other passed issues.
The crawler config is described in detail in "code snippets" section.
It uses the BFSDeepCrawlStrategy for crawling and LLMExtractionStrategy for extraction.
Current Behavior
[FETCH]... ↓ |
✓ | ⏱: 1.51s
[SCRAPE].. ◆ |
✓ | ⏱: 0.00s
[EXTRACT]. ■ |
✓ | ⏱: 0.00s
[COMPLETE] ● |
✗ | ⏱: 1.51s
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
Databricks
Python version
3.12.3
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response