Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Simple crawler framework, useful for creating your custom web crawlers/bots
Branch: master
Latest commit 4cbfb8 @tkisason Merge pull request #5 from ivuk/master
PEP 8 fixes
Failed to load latest commit information.
README Improved, added a simple shell PEP 8 fixes PEP 8 fixes


LinkCrawl, this is an ongoing progress to create a single script that can be used for all matter of pentesting/research needs.
Some functionality that will be added over time is:

* Extract all links from a page (done)
* Extract all links with same origin of the page (done)
* Extract all text content from a given class (done)
* Recursive searching 
* Run custom recursive searches for specific content on various sites
* Graphviz visualization of links/content

As an example of what linkcrawl should do, there is a enum_pastie script, which is an example how you can use linkcrawl to enumerate to find interesting pasties. An ongoing effort is to minimse the needed code for custom crawlers. 

Try some of the following queries: "123456 qwerty", "DB_PASSWORD", "phpmyadmin", "connect(", "exploit"

For example:

tony@enigma:~/2code/linkcrawl$ ./ 
[+] LinkCrawl: Enumerate, enter query: 123456 qwerty 
[+] Enter output file name: passwords
[+] Got links, dumping content to file, delimiter is: EOF EOF EOF
[+] Working: .............................................................................

[+] Enumeration done, output is in: passwords.html

After that, open the passwords.html in a web browser and enjoy :) More stuff will be added later on as i have time.

This script requires LXML (python-lxml) library. Make sure you install it before using the script. 

Don't expect an extensive GUI or CLI implementation, this is mostly a header, so you can prototype your custom searchers/parsers/crawlers faster, CLI parameters will be minimal at best, i like to use this with ipython. 

Usage is simple : ./ [OPTION] {,http://}


	-o: print only those that have the same origin as the requested site

Something went wrong with that request. Please try again.