Skip to content
This repository has been archived by the owner on Jul 19, 2022. It is now read-only.
/ linkcrawl Public archive

Simple crawler framework, useful for creating your custom web crawlers/bots

Notifications You must be signed in to change notification settings

tkisason/linkcrawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

LinkCrawl, this is an ongoing progress to create a single script that can be used for all matter of pentesting/research needs.
Some functionality that will be added over time is:

* Extract all links from a page (done)
* Extract all links with same origin of the page (done)
* Extract all text content from a given class (done)
* Recursive searching 
* Run custom recursive searches for specific content on various sites
* Graphviz visualization of links/content

As an example of what linkcrawl should do, there is a enum_pastie script, which is an example how you can use linkcrawl to enumerate pastie.org to find interesting pasties. An ongoing effort is to minimse the needed code for custom crawlers. 

Try some of the following queries: "123456 qwerty", "DB_PASSWORD", "phpmyadmin", "connect(", "exploit"

For example:

tony@enigma:~/2code/linkcrawl$ ./enum_pastie.py 
[+] LinkCrawl: Enumerate pastie.org, enter query: 123456 qwerty 
[+] Enter output file name: passwords
[+] Got links, dumping content to file, delimiter is: EOF EOF EOF
[+] Working: .............................................................................

[+] Enumeration done, output is in: passwords.html

After that, open the passwords.html in a web browser and enjoy :) More stuff will be added later on as i have time.


This script requires LXML (python-lxml) library. Make sure you install it before using the script. 

Don't expect an extensive GUI or CLI implementation, this is mostly a header, so you can prototype your custom searchers/parsers/crawlers faster, CLI parameters will be minimal at best, i like to use this with ipython. 

Usage is simple : ./linkcrawl.py [OPTION] {,http://}some.web.site

Options:

	-o: print only those that have the same origin as the requested site



About

Simple crawler framework, useful for creating your custom web crawlers/bots

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages