Skip to content

Latest commit

 

History

History
88 lines (57 loc) · 3 KB

setup.rst

File metadata and controls

88 lines (57 loc) · 3 KB

Initial setup

Learn how to get spider templates <spider-templates> installed and configured on an existing Scrapy project.

Tip

If you do not have a Scrapy project yet, use zyte-spider-templates-project as a starting template to get started quickly.

Requirements

  • Python 3.8+
  • Scrapy 2.11+

For Zyte API features, including AI-powered parsing, you need a Zyte API subscription.

Installation

pip install zyte-spider-templates

Configuration

In your Scrapy project settings (usually in settings.py):

  • Update SPIDER_MODULES <scrapy:SPIDER_MODULES> to include "zyte_spider_templates.spiders".
  • Configure scrapy-poet, and update SCRAPY_POET_DISCOVER <scrapy-poet:settings> to include "zyte_spider_templates.pages".

For Zyte API features, including AI-powered parsing, configure scrapy-zyte-api with scrapy-poet integration.

The following additional settings are recommended:

  • Set CLOSESPIDER_TIMEOUT_NO_ITEM <scrapy:CLOSESPIDER_TIMEOUT_NO_ITEM> to 600, to force the spider to stop if no item has been found for 10 minutes.
  • Set SCHEDULER_DISK_QUEUE <scrapy:SCHEDULER_DISK_QUEUE> to "scrapy.squeues.PickleFifoDiskQueue" and SCHEDULER_MEMORY_QUEUE <scrapy:SCHEDULER_MEMORY_QUEUE> to "scrapy.squeues.FifoMemoryQueue", for better request priority handling.
  • Update SPIDER_MIDDLEWARES <scrapy:SPIDER_MIDDLEWARES> to include "zyte_spider_templates.middlewares.CrawlingLogsMiddleware": 1000, to log crawl data in JSON format for debugging purposes.
  • Ensure that zyte_common_items.ZyteItemAdapter is also configured:

    from itemadapter import ItemAdapter
    from zyte_common_items import ZyteItemAdapter
    
    ItemAdapter.ADAPTER_CLASSES.appendleft(ZyteItemAdapter)
  • Update SPIDER_MIDDLEWARES <scrapy:SPIDER_MIDDLEWARES> to include "zyte_spider_templates.middlewares.AllowOffsiteMiddleware": 500 and "scrapy.spidermiddlewares.offsite.OffsiteMiddleware": None. This allows for crawling item links outside of the domain.

For an example of a properly configured settings.py file, see the one in zyte-spider-templates-project.