result.url is inconsistent with the actual browser display #886

xih1919 · 2025-03-26T03:34:18Z

xih1919
Mar 26, 2025

I am very interested in the function of crawl4ai, but I don't know much about its implementation principle. I set include_external=False, stream=True, DFSDeepCrawlStrategy, and the url in the result will be printed during the crawling process, but the url of the browser page is inconsistent with the url in the result. The browser url is cross-domain. Sorry, I can't provide the crawled website. It is an internal privacy website. The cross-domain url of the browser is an internal identity authentication website

The python program is roughly as follows：
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.deep_crawling import DFSDeepCrawlStrategy

async def main():
browser_conf = BrowserConfig(
headless=False,
cookies=[
{"name": "uid", "value": "adfcf4111111", "url": "https://aaaa.bbbb.cccc.com/pages/1"},
]
)

crawl_config = CrawlerRunConfig(
    deep_crawl_strategy=DFSDeepCrawlStrategy(
        max_depth=2,
        include_external=False,  
        max_pages=30,  
    ),
    stream=True,
)

async with AsyncWebCrawler(config=browser_conf, verbose=True) as crawler:
    async for result in await crawler.arun(
        "https://aaaa.bbbb.cccc.com/pages/1",
        config=crawl_config
    ):
        print(result.url)
        print(result.links.get("internal"))
        print(result.links.get("external"))

if name == "main":
asyncio.run(main())

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

result.url is inconsistent with the actual browser display #886

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

result.url is inconsistent with the actual browser display #886

Uh oh!

xih1919 Mar 26, 2025

Replies: 0 comments

xih1919
Mar 26, 2025