Skip to content

[Bug]: memory leak issue when running the crawler in Docker #1608

@fred-mhyt

Description

@fred-mhyt

crawl4ai version

0.7.4

Expected Behavior

Concurrent url crawling

Current Behavior

I encountered a memory leak issue when running the crawler in Docker. Our production server configuration is 1 CPU and 8GB of memory, and even when crawling only 5 URLs concurrently, the memory leak still occurs. However, it works normally on Windows, and the memory_threshold_percent parameter in MemoryAdaptiveDispatcher has no effect at all.

default_memory_dispatcher = MemoryAdaptiveDispatcher(
    memory_threshold_percent=85,
    check_interval=1,
    max_session_permit=5,
    rate_limiter=RateLimiter(
        base_delay=(3, 15), 
        max_delay=20.0,
        max_retries=1 
    )
)
 async with AsyncWebCrawler(config=self.general_browser_config) as crawler:
                results = await crawler.arun_many(
                    urls=urls,
                    config=config,
                    dispatcher=default_memory_dispatcher 
                )

This is the memory usage statistics in production. The more complex the content of a web page is, the more memory it uses

Image

This is the local windows memory usage statistics

Image

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Linux(docker)

Python version

3.12.0

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    ⚙️ In-progressIssues, Features requests that are in Progress🐞 BugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions