NUE Digital Festival

DevBBQ@TeamBank on the 15th of July 2019

Webscraping, -automation and crawling with Python

Presentation (mainly webscraping and -automation)

see PDF for speaker deck
"NUE Digital Festival - Webcrawling slides.pdf"

Webcrawling

Necessary

Step 1) Define desired output dict in items.py
Step 2) Define crawler in nuedigital_spider.py
Step 3) Start crawler with run_spider.py

Optional

Configure spider in settings.py (e.g. logging, depth limit, cookies, user agents)
Customize HTTP Request in middleware.py (e.g. JavaScript rendering with Selenium)
Customize HTTP Response processing in pipelines.py (e.g. Cleaning text, Filtering responses, Write data to database)

see scrapy docs for detailed description

contact: magdalena.deschner@teambank.de

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
NUEDigital		NUEDigital
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
slides.pdf		slides.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NUEDigital

NUEDigital

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

SECURITY.md

SECURITY.md

slides.pdf

slides.pdf

Repository files navigation

NUE Digital Festival

DevBBQ@TeamBank on the 15th of July 2019

Webscraping, -automation and crawling with Python

Presentation (mainly webscraping and -automation)

Webcrawling

About

Releases

Packages

Contributors 2

Languages

License

teambank/nuedigitalfestival2019-webcrawling

Folders and files

Latest commit

History

Repository files navigation

NUE Digital Festival

DevBBQ@TeamBank on the 15th of July 2019

Webscraping, -automation and crawling with Python

Presentation (mainly webscraping and -automation)

Webcrawling

About

Resources

License

Security policy

Stars

Watchers

Forks

Languages