GitHub - urban48/py.webCrawler: Multithreaded web crawler

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
README.txt		README.txt
webc.py		webc.py

Repository files navigation

about:
this is a multithreaded  web crawler written in Python 3.2.2.
its purpose right now is to collect images and check them if they got hidden files inside
but that may change if i find better use for it..

dependencies:
lxml module

usage:
python webc.py  -u your url(make sure its a full url with - https/http)  -t number of threads -l depth level (optional)
if depth level is 0 (-l 0) it will continue untill no links will be found - infinite depth level


example
python web.py -u http://www.rootwebsite.com -t 8 -l 5

keyboard control:
q - stop the program
s - stop the crawler threads
l - depth level (if not set, will only retrieve data from root url)