Closed
Description


configuration = service_locator.get_configuration()
# configuration.purge_on_start = False
configuration.write_metadata = False
configuration.persist_storage = False
configuration.available_memory_ratio = 0.8
configuration.internal_timeout = timedelta(minutes=5)
It gradually becomes sluggish until the program freezes completely.
Activity
B4nan commentedon May 21, 2025
You are setting the available memory to 80%, htop reports 7.65g, 80% of that is 6.12g, so its matching exactly.
ycq0125 commentedon May 21, 2025
However, the memory usage shown by htop is around 3.5GB.Crawlee actually shows that it used 6.99GB
Pijukatel commentedon May 21, 2025
From the screenshot I see 3.546/7.65 GB used, while it is printing something about 6.05 GB being used. So I guess it is not issue about the limit calculation, but about the actual value of the used memory.
Looking into the code I see we sum up memory usage of the process and all it's children
And from docs we can get to this blog post where this line might be particularly interesting:
So here is my wild guess: Maybe we count some memory twice(or multiple times) if we sum up the usage by children that are using the same shared memory?
I will continue looking into this with some tests
B4nan commentedon May 21, 2025
Yes, but visually, you can see that 3.5g is the green part, and you still have a huge yellow part, which should be a cache. Summing those, I'd say it could be about 6g.
You could try to set the memory explicitly to 16g and see if things fall apart because of OOM (which I would read as we count things correctly and you misread what htop is showing) or they run fine (which would confirm its something on our end).
This is still occupied memory, its not relevant who uses it, right?
Pijukatel commentedon May 21, 2025
Yes, but it should be counted only once. Maybe we count it multiple times by summing up memory usage of the children processes that use some same portion of the shared memory? But I have to do some tests first to see if it is correct or wrong assumption.
PSS
instead ofRSS
to estimate children process memory usage on Linux #1210fix: Use `PSS` instead of `RSS` to estimate children process memory u…