Skip to content

urban48/py.webCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

about:
this is a multithreaded  web crawler written in Python 3.2.2.
its purpose right now is to collect images and check them if they got hidden files inside
but that may change if i find better use for it..

dependencies:
lxml module

usage:
python webc.py  -u your url(make sure its a full url with - https/http)  -t number of threads -l depth level (optional)
if depth level is 0 (-l 0) it will continue untill no links will be found - infinite depth level


example
python web.py -u http://www.rootwebsite.com -t 8 -l 5

keyboard control:
q - stop the program
s - stop the crawler threads
l - depth level (if not set, will only retrieve data from root url)

About

Multithreaded web crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages