There are 3 spiders implemented with ItemLoader and without it. You can consider usage of ItemLoader for your spiders after reading the readme file until the end.
Table of Contents
If you are a newbie in scrapy but have already written several spiders and gonna write more spiders, you should consider usage of ItemLoader if you don't use it yet. I will not describe features of ItemLoader and processors, check out official docs for this. But I will show migration from real world spiders without ItemLoader to spiders with ItemLoaders.
- Replace bare item field assignments with ItemLoader
- Usage of context selectors which simplify code
- Required step: add output processors
- Default output processor
- Optional step: extending ItemLoader
$ pip install scrapy
$ git clone git@github.com:taroved/3spiders-with-itemloader.git
# check contracts for spiders
$ cd 3spiders-with-itemloader
$ scrapy check
$ scrapy crawl apple
Output scrapped data to the file and write log file:
$ scrapy crawl apple -o apple.json --logfile=apple.log
The second spider scrape locations from wetseal.com:
$ scrapy crawl wetseal -o wetseal.json
The third spider scrape products from hhgregg.com:
$ scrapy crawl hhgregg -o hhgregg.json
I haven't said a lot, but you can take a look at the full diff between versions without and with ItemLoader for the spiders and make the right decision.
WTFPL