Skip to content

Corpus is an asynchronous web crawler for you to grab a set of sample files. Then use afl-cmin to create a minset of them for later use with AFL. Code is provided as is and likely won't be maintained by me. Feel free to use it (at your own risk).

License

Notifications You must be signed in to change notification settings

thelumberjhack/corpusgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Corpus

Description

Corpus is an asynchronous web crawler for you to grab a set of sample files. Then use afl-cmin to create a minset of them for later use with AFL

Setup

Corpus has been implemented using asyncio module from python 3.5 therefore you need to use python >= 3.5.0.

Pre-requisites

virtualenvwrapper>=4.7
$ pip install mkvirtualenv

Virtualenv configuration is left to the discretion of the user. Once you're setup go to the next steps.

Installation

Clone source and then create virtualenv to use Corpus app as follows:

$ cd corpus
$ mkvirtualenv -p python3 -r requirements.txt corpus

Now you are ready to use it.

Usage

$ workon corpus
(corpus) $ ./corpus.py
usage: corpus.py --roots [ROOT_DOMAINS [ROOT_DOMAINS ...]] --file_type
                 FILE_TYPE -o OUT_DIR [-i] [--select] [-r MAX_REDIRECT]
                 [-t MAX_TRIES] [-c MAX_TASKS] [-e REGEX] [-s] [-v] [-q]
                 [-m MAX_SIZE]
corpus.py: error: the following arguments are required: --roots, --file_type, -o/--output
(corpus) $
(corpus) $ ./corpus.py www.adobe.com --file-type pdf -o test

About

Corpus is an asynchronous web crawler for you to grab a set of sample files. Then use afl-cmin to create a minset of them for later use with AFL. Code is provided as is and likely won't be maintained by me. Feel free to use it (at your own risk).

Topics

Resources

License

Stars

Watchers

Forks

Languages