Skip to content

Commit

Permalink
add section to broad-crawl topic
Browse files Browse the repository at this point in the history
  • Loading branch information
whalebot-helmsman committed Dec 25, 2018
1 parent 2fc35de commit d8e2b25
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions docs/topics/broad-crawls.rst
Expand Up @@ -39,6 +39,17 @@ you need to keep in mind when using Scrapy for doing broad crawls, along with
concrete suggestions of Scrapy settings to tune in order to achieve an
efficient broad crawl.

Use proper :setting:`SCHEDULER_PRIORITY_QUEUE`
==============================================

Default scrapy's scheduler priority queue is ``'queuelib.PriorityQueue'``.
It works best during single domain crawl. And it does not work well with crawling
many different domains in parallel

To apply recommended priority queue use::

SCHEDULER_PRIORITY_QUEUE = 'scrapy.pqueues.DownloaderAwarePriorityQueue'

Increase concurrency
====================

Expand Down

0 comments on commit d8e2b25

Please sign in to comment.