multiple corpus locations as input to fuzzing, and change default output corpus location #7

thepudds · 2019-08-03T20:50:08Z

This adds support for multiple corpus locations as input to fuzzing.

Destinations for new corpus entries

There are three possible destinations for writing new corpus entries, in decreasing order of sophistication or work needed on the part of the user:

-fuzzdir=/some/path: Corpus is written to the user-specified directory (perhaps outside of VCS, perhaps ultimately stored as a tar.gz in blob storage, perhaps a separate foo-corpus repo)
-fuzzdir=testdata: Corpus is written to <pkg-path>/testdata/fuzz/<func> for all matching fuzz functions (and therefore, most often in same VCS repo as the code under test)
-fuzzdir not set: Default to writing corpus under GOPATH/pkg/fuzz/corpus/...

Three rules for what corpus gets read, what corpus gets written

There are three simple rules for what corpus gets read, and what gets written:

fzgo always reads from all known corpus sources that exist.
fzgo always writes only to the user's requested destination (with a reasonable default if -fuzzdir is not specified)
if the destination does already not exist for a particular function being fuzzed, fzgo seeds the destination with any matching corpus elements present in any of the other known corpus locations from that function.

The last rule helps with transitioning from one location to another, without needing to manually hunt around or writing a cp script, which is important especially if you have many fuzz functions or packages being fuzzed.

Additional details and rationale

GOPATH/pkg/fuzz/corpus/... is a convenient default location, with some nice attributes:

It means you don't end up with a dirty VCS status by default if you are fuzzing your own code. (It is fairly annoying to dirty your VCS status if you don't intend to, especially if you are just fuzzing for a brief period of time).
It works even if the code under test is stored in a read-only <pkg-path> (which is the common case for a dependency under modules, which are stored in a read-only module cache).
The user doesn't need to think about where to store their corpus, which is convenient if you are starting out or just doing brief amounts of fuzzing.
The user didn't need to set up anything else (e.g., did not need to set up a separate repo, did not need to set up some other shared storage, etc.).
The corpus is saved somewhere, and it will be re-used when you fuzz again on that machine.

That said, defaulting to GOPATH/pkg/fuzz/corpus/... means that location is typically local to the current machine, and not shared with anyone else. That's fine if you are fuzzing for "minutes" or "hours", because even if you lose a machine you can mostly regain that CPU time spent by kicking off a fuzz run over night, or over the weekend, etc. However, if you have been fuzzing for "days" or "weeks", you probably want to store your corpus somewhere more permanent and more shareable, at which point you need to make a decision: store it in VCS along with your code under test (-fuzzdir=testdadta), or store it somewhere else under your control (-fuzzdir=/some/path) however you see fit.

In order to make that transition from "I've been fuzzing locally and not been too worried about sharing my corpus" to "I want a more permanent or more shareable location for my corpus", when you first pick a non-default location for the corpus (via -fuzzdir=/some/path or -fuzzdir=testdata), fzgo will seed that new non-default location with whatever it finds in GOPATH/pkg/fuzz/corpus/... for the particular fuzz functions being run. This means you don't lose whatever you have found so far if you have been using GOPATH/pkg/fuzz/corpus/... for convenience.

In addition, whenever you fuzz, it uses the corpus from all known locations as input. For example, if you fuzz via go test ./... -fuzz=. -fuzzdir=/some/path, then any unique corpus elements found in <pkg-path>/testdata/fuzz/<func>, GOPATH/pkg/fuzz/corpus/..., or /some/path will all be used as input corpus for any matching fuzz functions.

(There is a asterisk on the last two paragraphs -- they describe what I am currently proposing as the behavior, but because dvyukov/go-fuzz does not actually support reading from multiple input corpus locations, fzgo currently approximates the proposed behavior, which is described the copyCachedCorpus comment in main.go. In short, it currently always copies any unique corpus files to whatever the destination corpus location is, which can be argued might actually be better behavior anyway).

…zzing, change default dest

…t dest

…lt dest

…ault dest

…e default dest

…t dest

…fault dest

thepudds added 7 commits August 3, 2019 15:39

fuzzing_rich_signatures.txt: multiple corpus locations as input to fu…

bd56256

…zzing, change default dest

main.go: multiple corpus locations as input to fuzzing, change defaul…

a212f44

…t dest

cache.go: multiple corpus locations as input to fuzzing, change defau…

d903c7f

…lt dest

richsig.go: multiple corpus locations as input to fuzzing, change def…

78e6cc9

…ault dest

richsig_test.go: multiple corpus locations as input to fuzzing, chang…

9245e25

…e default dest

exec.go: multiple corpus locations as input to fuzzing, change defaul…

319ebd6

…t dest

packages.go: multiple corpus locations as input to fuzzing, change de…

815abbd

…fault dest

thepudds merged commit 709a788 into master Aug 3, 2019

thepudds deleted the dev-multi-corpus-location branch August 3, 2019 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiple corpus locations as input to fuzzing, and change default output corpus location #7

multiple corpus locations as input to fuzzing, and change default output corpus location #7

thepudds commented Aug 3, 2019 •

edited

multiple corpus locations as input to fuzzing, and change default output corpus location #7

multiple corpus locations as input to fuzzing, and change default output corpus location #7

Conversation

thepudds commented Aug 3, 2019 • edited

Destinations for new corpus entries

Three rules for what corpus gets read, what corpus gets written

Additional details and rationale

thepudds commented Aug 3, 2019 •

edited