Learn how to get spider templates <spider-templates>
installed and configured on an existing Scrapy project.
Tip
If you do not have a Scrapy project yet, use zyte-spider-templates-project as a starting template to get started quickly.
- Python 3.8+
- Scrapy 2.11+
For Zyte API features, including AI-powered parsing, you need a Zyte API subscription.
pip install zyte-spider-templates
In your Scrapy project settings (usually in settings.py
):
- Update
SPIDER_MODULES <scrapy:SPIDER_MODULES>
to include"zyte_spider_templates.spiders"
. - Configure scrapy-poet, and update
SCRAPY_POET_DISCOVER <scrapy-poet:settings>
to include"zyte_spider_templates.pages"
.
For Zyte API features, including AI-powered parsing, configure scrapy-zyte-api with scrapy-poet integration.
The following additional settings are recommended:
- Set
CLOSESPIDER_TIMEOUT_NO_ITEM <scrapy:CLOSESPIDER_TIMEOUT_NO_ITEM>
to 600, to force the spider to stop if no item has been found for 10 minutes. - Set
SCHEDULER_DISK_QUEUE <scrapy:SCHEDULER_DISK_QUEUE>
to"scrapy.squeues.PickleFifoDiskQueue"
andSCHEDULER_MEMORY_QUEUE <scrapy:SCHEDULER_MEMORY_QUEUE>
to"scrapy.squeues.FifoMemoryQueue"
, for better request priority handling. - Update
SPIDER_MIDDLEWARES <scrapy:SPIDER_MIDDLEWARES>
to include"zyte_spider_templates.middlewares.CrawlingLogsMiddleware": 1000
, to log crawl data in JSON format for debugging purposes. Ensure that
zyte_common_items.ZyteItemAdapter
is also configured:from itemadapter import ItemAdapter from zyte_common_items import ZyteItemAdapter ItemAdapter.ADAPTER_CLASSES.appendleft(ZyteItemAdapter)
- Update
SPIDER_MIDDLEWARES <scrapy:SPIDER_MIDDLEWARES>
to include"zyte_spider_templates.middlewares.AllowOffsiteMiddleware": 500
and"scrapy.spidermiddlewares.offsite.OffsiteMiddleware": None
. This allows for crawling item links outside of the domain.
For an example of a properly configured settings.py
file, see the one in zyte-spider-templates-project.