Skip to content

Scrapy integration with Tor for anonymous web scraping

Notifications You must be signed in to change notification settings

udirom/scrapy-tor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrapy-tor

This is a scrapy project skeleton with Tor integration

How to get started

Beacuse scrapy does not work with SOCKS proxy, you'll need to set up a web proxy server that relays requests to Tor. You can install Polipo, a lightweight web proxy. Then point Polipo to Tor's listening port, which is 9050 by default.

Uncomment or add the following lines to Polipo's config file etc/polipo/config to set up Polipo.

socksParentProxy = localhost:9050
disableLocalInterface=true
diskCacheRoot = ""

The function ProxyMiddleware defined in middlewares.py will relay all scrapy's requests to Polipo's default port of 8123

Don't forget to start Polipo and Tor before scraping!

About

Scrapy integration with Tor for anonymous web scraping

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 56.9%
  • Python 43.1%