a crawler that should be fast/strong/tricky
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
scheduler
spiders
test
.gitignore
LICENSE
README.md
__init__.py
application.py
config.py
crawler.py
database.py
downloader.py
exception.py
logmanager.py
monitor.py
options.py
proxy.py
requirements.txt
threadPool.py
util.py
webPage.py

README.md

爬虫

一个貌似很健壮的爬虫

TODO

  • rewrite it with gevent
  • add proxy support