Do different crawling projects share a browser pool? #2861

kanxue660 · 2025-02-27T12:20:19Z

kanxue660
Feb 27, 2025

Hi All,

If I have written two crawling projects that are started simultaneously, do they share a browser pool, or does each crawling project initiate its own browser pool?

Answered by barjin

Mar 2, 2025

The current purpose of BrowserPool is to handle browser management during the crawler run and provide a unified interface for opening / closing pages in the managed browsers + handle fingerprint injection and proxy setup.

There indeed might be a performance hit from not reusing the managed browsers across multiple concurrent crawls. Unfortunately, right now, there is no way of instantiating the BrowserPool separately and passing it to the crawler instance. While there might be actual technical reasons for this (e.g. the way that proxies currently bind to running browsers), this is IMO rather a design oversight.

Currently, your best bets are:

Run the projects separately and accept the per…

View full answer

janbuchar · 2025-02-28T07:15:45Z

janbuchar
Feb 28, 2025
Maintainer

Hello, every crawling project has its own browser pool, or more precisely, every node process used to start a crawl has its own browser pool. Sharing them would be difficult with the current Crawlee.

1 reply

B4nan Feb 28, 2025
Maintainer

To be even more precise, every crawler instance has its own browser pool:

crawlee/packages/browser-crawler/src/internals/browser-crawler.ts

Line 449 in f3927e6

this.browserPool = new BrowserPool<InternalBrowserPoolOptions>({

kanxue660 · 2025-03-02T07:26:11Z

kanxue660
Mar 2, 2025
Author

Thank you for your answer in advance. So if this happens, what is the point of a browser pool for each website as a project? I don’t pursue speed, but I will crawl many different websites and have different page structures. How to share the browser pool? Otherwise, each project will have to start and close the browser, worrying about occupying system resources

1 reply

barjin Mar 2, 2025
Maintainer

The current purpose of BrowserPool is to handle browser management during the crawler run and provide a unified interface for opening / closing pages in the managed browsers + handle fingerprint injection and proxy setup.

There indeed might be a performance hit from not reusing the managed browsers across multiple concurrent crawls. Unfortunately, right now, there is no way of instantiating the BrowserPool separately and passing it to the crawler instance. While there might be actual technical reasons for this (e.g. the way that proxies currently bind to running browsers), this is IMO rather a design oversight.

Currently, your best bets are:

Run the projects separately and accept the performance hit (it might not be too severe, especially if you're not running a separate crawler for 1-2 URLs)
Combine the codebases of your projects and run all of them in a single crawler instance. This will clutter your codebase, though. You can mitigate that by splitting the behavior branches using a Router instance.

to other maintainers: I added this to the internal Crawlee v4 proposed features Notion document.

Answer selected by kanxue660

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do different crawling projects share a browser pool? #2861

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Do different crawling projects share a browser pool? #2861

kanxue660 Feb 27, 2025

Replies: 2 comments · 2 replies

janbuchar Feb 28, 2025 Maintainer

B4nan Feb 28, 2025 Maintainer

kanxue660 Mar 2, 2025 Author

barjin Mar 2, 2025 Maintainer

kanxue660
Feb 27, 2025

Replies: 2 comments 2 replies

janbuchar
Feb 28, 2025
Maintainer

B4nan Feb 28, 2025
Maintainer

kanxue660
Mar 2, 2025
Author

barjin Mar 2, 2025
Maintainer