see PDF for speaker deck
"NUE Digital Festival - Webcrawling slides.pdf"
Necessary
- Step 1) Define desired output dict in
items.py
- Step 2) Define crawler in
nuedigital_spider.py
- Step 3) Start crawler with
run_spider.py
Optional
- Configure spider in
settings.py
(e.g. logging, depth limit, cookies, user agents) - Customize HTTP Request in
middleware.py
(e.g. JavaScript rendering with Selenium) - Customize HTTP Response processing in
pipelines.py
(e.g. Cleaning text, Filtering responses, Write data to database)
see scrapy docs for detailed description
contact: magdalena.deschner@teambank.de